Re: Perl & unicode weirdness.

Glenn Maynard Mon, 02 Feb 2004 14:52:49 -0800

On Mon, Feb 02, 2004 at 12:21:40PM -0800, Larry Wall wrote:
> locales for everyone willy nilly.  So 5.8.1 backed off on that, with
> the result that you have to be a little more intentional about your
> input formats (or set the PERL_UNICODE environment variable).


What's the normal way to say "use the locale, like every other Unix
program that processes text"?  Setting PERL_UNICODE seems to make it
*always* use Unicode:

04:39pm [EMAIL PROTECTED]/5 [~] export LANG=en_US.ISO-8859-1
04:39pm [EMAIL PROTECTED]/5 [~] perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
�
�
04:39pm [EMAIL PROTECTED]/5 [~] export PERL_UNICODE=1
04:39pm [EMAIL PROTECTED]/5 [~] perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
�

Also, with PERL_UNICODE=1 in en_US.UTF-8, entering � outputs one byte,
0xfa (the codepoint), instead of 0xc3 0xba; why?

This is perl, v5.8.2 built for i386-linux-thread-multi

(It's a shame that Perl doesn't behave like everyone else and obey
locale settings correctly; I thought we were finally getting away
from having to tell each program individually to use UTF-8.  I don't
understand the logic of "RedHat set the locale to UTF-8 prematurely,
so Perl shouldn't obey the locale".)

-- 
Glenn Maynard

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl & unicode weirdness.

Reply via email to