HTML::Parser modifies unicode characters

Moshe Kaminsky Sat, 11 Sep 2004 12:38:19 -0700

Hi,

It appears that HTML::Parser modifies some unicode characters while 
parsing. The following program gives an example:


#########

#!/usr/bin/perl
use HTML::Parser;
use utf8;
open TEST, '>:utf8', 'word.txt';
my $p = new HTML::Parser text_h => [sub {print TEST shift}, 'text'];
$p->parse("zespoÅÃw\n");
close TEST;

#########

After running it, 'word.txt' contains "zespoÅÃw" with the funny l and 
the funny o following it transformed to something else. What am I doing 
wrong?
I'm running: perl 5.8.5, HTML::Parser version 3.36 on linux.

Thanks,
Moshe

pgpBCzUf0Ovjw.pgp
Description: PGP signature

HTML::Parser modifies unicode characters

Reply via email to