On Wed, 2011-11-02 at 21:03 +0200, goran kent wrote: > WIth precisely this in mind, my code does some gymnastics to try and > make sure bad utf8 doesn't make it in. But,... you never know when > dealing with the vagaries of the 'tubes.
It's not uncommon for web sites to lie about the encoding of the content they serve up. In particular, ASCII, UTF-8, ISO8859-1 and CP1252 are all completely interchangeable - up to the point where they're not. My https://metacpan.org/module/Encoding::FixLatin module is designed to help in dealing with that sort of situation and especially the case where a single document contains bytes from more than one encoding. Cheers Grant
