Summary:  How to use Perl 5.8.0 to handle files encoded using utf-16 on
Windows?

Details:
I have read that perl 5.8 ought to handle utf-16 without me needing to
tell it anything.
But I am now getting the behavior I expect.
Specifically, I want to find what changed in a Registry after I install
a program.
So I export the whole Windows Registry to a *.txt file.  This file is
written using utf16 (technically utf-le because Intel in little endian).
Then I install the program, and export the Registry again to a second
file.
These files are very large, over 100 MB.  So the port of diff.exe to
Windows quickly dies, saying 
diff: memory exhausted

I then tried diff.pl (which uses diff.pm) and watched the memory usage
slowly grow to over 100 MB; I never got any output.  So I decided to
reduce the number of lines in the file by removing all the "binary" data
(which in the text file is plain text, matching this pattern: ^\d{8}   


However the following command line perl program fails, in that it emits
every input line to the output.  I suspect this problem is caused by the
fact that the file is UTF16.

perl -ne "print if ! m/^\d{8}/" reg1.txt > reg1_reduced.txt

Note: \d is equivalent to [0-9]  -- using that failed also.

I then tried to include the NUL bytes and used this
perl -ne "print if ! m/^[0-9\000]{8}/" reg1.txt > reg1_reduced.txt
But that somehow caused the new lines to disappear.  

So I am asking for help.

Thanks,
Steve
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to