Summary: How to use Perl 5.8.0 to handle files encoded using utf-16 on
Windows?
Details:
I have read that perl 5.8 ought to handle utf-16 without me needing to
tell it anything.
But I am now getting the behavior I expect.
Specifically, I want to find what changed in a Registry after I install
a program.
So I export the whole Windows Registry to a *.txt file. This file is
written using utf16 (technically utf-le because Intel in little endian).
Then I install the program, and export the Registry again to a second
file.
These files are very large, over 100 MB. So the port of diff.exe to
Windows quickly dies, saying
diff: memory exhausted
I then tried diff.pl (which uses diff.pm) and watched the memory usage
slowly grow to over 100 MB; I never got any output. So I decided to
reduce the number of lines in the file by removing all the "binary" data
(which in the text file is plain text, matching this pattern: ^\d{8}
However the following command line perl program fails, in that it emits
every input line to the output. I suspect this problem is caused by the
fact that the file is UTF16.
perl -ne "print if ! m/^\d{8}/" reg1.txt > reg1_reduced.txt
Note: \d is equivalent to [0-9] -- using that failed also.
I then tried to include the NUL bytes and used this
perl -ne "print if ! m/^[0-9\000]{8}/" reg1.txt > reg1_reduced.txt
But that somehow caused the new lines to disappear.
So I am asking for help.
Thanks,
Steve
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm