On Sep 15, 2005, at 07:05 , Steve Larson wrote:

What I want to do is add a version string comment at the beginning of .xml files. I test to see if the file is UNICODE (Encode::Unicode) or ASCII (Encode::XS) using guess_encoding. My ASCII case works fine but the regexp for the UNICODE case fails. Below snippet is the code for the UNICODE case.

The answer is that PerlIO does not go well with BOMed UTFs. What you should do instead is to read the whole file first like this;

open my $in, "<:raw", $filename or die "$filename : $!";
read $in, my $buf, -s $filename; # one of many ways to slurp file.
close $in;
my $content = decode("UTF16", $buffer); # LE or BE is not required.
#
# do whatever you want to $content and....
#
open my $out, ":>raw", $filename or die "$filename : $!";
print $out encode("UTF16-LE", $buffer); # now be explicit on endianness
close $out;

Remember UTF-(16|32) does not go well with stream models. Treat it as a binary file.

Dan the Encode Maintainer

Reply via email to