On Sat, Mar 15, 2003 at 08:32:24PM +0000, Markus Kuhn wrote:
: The attached Perl script print cuts from all lines in a plaintext file
: that contain non-ASCII bytes. With option -m, it looks for malformed and
: overlong UTF-8 sequences instead. Usefull for reviewing files with
: unknown encoding manually.
I haven't tried it, but from a cursory inspection I don't believe
it'll work under Perl 5.8.0 in any UTF-8 locale unless you throw one
of these in there at the top:
use bytes;
binmode(STDIN,":bytes");
use open IO => ':bytes:std';
And if you also want it to work with ancient versions of Perl, your
best bet is something like:
eval 'binmode(STDIN,":bytes"); binmode(STDOUT,":bytes")';
Sorry 'bout that. I didn't expect RedHat 8.0 to turn on UTF-8 for you
by default, and I shouldn't have believed what I read on this mailinglist
about the degree of committment implied by UTF-8 locales... :-)
Larry
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/