Hi,

I am trying to read a UTF-8 coded file, decode its html character entities,
and print it into another UTF-8 coded file.
The program works fine if I write the line:
$t++; last if $t > 200;

If I comment that line (for parsing the entire file, and not only the first
200 lines), the program finishes its job, but the created file is not
printed correctly.
The result2.txt file might be UTF-8 encoded after printing it, because
TextPad editor alerts me that it contains special chars when I try to open
it, but some UTF-8 encoded chars are printed with 2 characters instead a
single correct one.

What could be wrong? Why if the file printed is bigger it is not printed
correctly? Or am I doing something wrong?

If you want to test this, the source file result.txt can be downloaded from:

http://www.tranzactiibursiere.ro/result.zip
(The file has 2.99 mb)

Thank you very much.

Here is the code:

use strict;
use HTML::Entities;

open(OUT, ">:utf8", "result2.txt");
open(IN, "<:utf8", "result.txt");
my $t;
while (<IN>) {
#$t++; last if $t > 200;
print OUT decode_entities($_);
}
close IN;
close OUT;

Teddy


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to