Ben Hiebert wrote :
Perl usually tries to guess at the best encoding when it takes in the data and then encodes it internally as best it can. You may have a problem where the data comes in as ISO88591 but perl thinks it is UTF8 data, encodes it internally as UTF8 and then prints out the UTF8-as-ISO88591 to give you the bad results.

Yes, that is my guess too.

It may be worth checking to see what format Perl thinks your incoming data is by using
$flag = utf8::is_utf8(STRING);

Good idea. I modified the code to this :

while (read($fdat{efilename},$buffer,32768)) {
        if (utf8::is_utf8($buffer)) {
                print OUT "u";
        }
        print FILE $buffer;
}

...but in both cases (working and not) I never get the "uuuuu" lines.
BUT when the $buffer is written to disk it is transformed ! I tried
with binmode FILE just after opening the file for output but same
things happen.

If perl thinks UTF8 then it is misintepreting your incoming data and you'll need to either decode it with decode or with one of the other UTF8 utilities. This may work:

$GoodInternalString = decode("iso-8859-1", $IncomingData);

That's what I use when the file *is* iso-8859-1.

These are the pages I read over and over and over again until my pages magically work:

:-) I see *exactly* what you mean. I've read these pages over and over too.

I don't get the reason for that random behaviour.

Thanks,

JC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to