Re: Searching for a string in a text buffer with a regular expression

maxpat78 Mon, 09 Dec 2013 00:01:44 -0800

I mean a code fragment like this:

        foreach(i; 1..2085)
        {

// Bugbug: when we read in the buffer, we can't know anythingabout its encoding...

                // But REGEX could fail if it contained unknown chars!
                Latin1String buf;
                string s;


                try
                {
                        buf = cast(Latin1String) 
read(format("psi\\psi%04d.htm", i));
                        transcode(buf, s);
                }
                catch (Exception e)
                {
                        writeln("Last record (", i, ") reached.");
                        exit(1);
                }

                // Exception "Invalid UTF-8 sequence @index 1" in file 55

enum rx = ctRegex!(`<p class="aggiornamentoAlbo">.+?</div>`,"gs");

                auto m = match(s, rx);

                if (! m.empty())
                {

if (indexOf(m.captures[0], "xxxxxxxx", 0) > -1 &&indexOf(m.captures[0], "1983", 0) > -1)

                                writeln(m.captures[0]);
                }
        }

The question is: what kind of cast should I use to safely(=without conversion exceptions got raised) scan all possiblekind of textual (or binary) buffer, lile in Python 2.7.x?


Thanks!

Re: Searching for a string in a text buffer with a regular expression

Reply via email to