Having suffered too many wasted hours on encoding detection, I can
sum up the best practice in a few sentences:
1) Know your encoding input
2) Know your encoding output requirements
3) If you're guessing ("detecting"), you're going to have some pretty
(un)funny results, especially if you can only narrow down your source
to "anything from anywhere."
If you have an app of mishmashed encodings, I'd highly recommend
putting a lot of effort into moving input and output to UTF-8 and not
into encoding detection.
-Micah
On Dec 2, 2007, at 8:11 AM, Jonathan Rockway wrote:
On Sun, 2007-12-02 at 16:41 +0200, Angel Kolev wrote:
Hi again :) I found a solution i think. With Encode::Detect i can do:
use Encode;
require Encode::Detect;
my $utf8 = decode("Detect", $data);
Looks like this module uses Mozilla's encoding detector, which does a
pretty good job in my experience. If you're trying to guess the
encoding of small pieces of text, though, this method probably won't
work. The best thing to do is to ask the user what encoding he's
using,
or mandate UTF-8.
Regards,
Jonathan Rockway
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/
[EMAIL PROTECTED]/
Dev site: http://dev.catalyst.perl.org/
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[EMAIL PROTECTED]/
Dev site: http://dev.catalyst.perl.org/