On Mon, Feb 02, 2004 at 12:21:40PM -0800, Larry Wall wrote:
> locales for everyone willy nilly. So 5.8.1 backed off on that, with
> the result that you have to be a little more intentional about your
> input formats (or set the PERL_UNICODE environment variable).
What's the normal way to say "use the locale, like every other Unix
program that processes text"? Setting PERL_UNICODE seems to make it
*always* use Unicode:
04:39pm [EMAIL PROTECTED]/5 [~] export LANG=en_US.ISO-8859-1
04:39pm [EMAIL PROTECTED]/5 [~] perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
�
�
04:39pm [EMAIL PROTECTED]/5 [~] export PERL_UNICODE=1
04:39pm [EMAIL PROTECTED]/5 [~] perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
�
Also, with PERL_UNICODE=1 in en_US.UTF-8, entering � outputs one byte,
0xfa (the codepoint), instead of 0xc3 0xba; why?
This is perl, v5.8.2 built for i386-linux-thread-multi
(It's a shame that Perl doesn't behave like everyone else and obey
locale settings correctly; I thought we were finally getting away
from having to tell each program individually to use UTF-8. I don't
understand the logic of "RedHat set the locale to UTF-8 prematurely,
so Perl shouldn't obey the locale".)
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/