I mean a code fragment like this:

        foreach(i; 1..2085)
        {
// Bugbug: when we read in the buffer, we can't know anything about its encoding...
                // But REGEX could fail if it contained unknown chars!
                Latin1String buf;
                string s;

                try
                {
                        buf = cast(Latin1String) 
read(format("psi\\psi%04d.htm", i));
                        transcode(buf, s);
                }
                catch (Exception e)
                {
                        writeln("Last record (", i, ") reached.");
                        exit(1);
                }

                // Exception "Invalid UTF-8 sequence @index 1" in file 55
enum rx = ctRegex!(`<p class="aggiornamentoAlbo">.+?</div>`, "gs");
                auto m = match(s, rx);

                if (! m.empty())
                {
if (indexOf(m.captures[0], "xxxxxxxx", 0) > -1 && indexOf(m.captures[0], "1983", 0) > -1)
                                writeln(m.captures[0]);
                }
        }

The question is: what kind of cast should I use to safely (=without conversion exceptions got raised) scan all possible kind of textual (or binary) buffer, lile in Python 2.7.x?

Thanks!

Reply via email to