Re: Perl & unicode weirdness.

Larry Wall Mon, 02 Feb 2004 13:26:50 -0800

On Mon, Feb 02, 2004 at 09:00:50AM +0000, Edmund GRIMLEY EVANS wrote:
: Markus Kuhn <[EMAIL PROTECTED]>:
: 
: > Perl Unicode support before 5.8.0 was experimental, incomplete and in
: > practice not useable. Perl 5.8.0 worked pretty smoothly for me, I
: > discovered in my own use only one single UTF-8-related bug to do with
: > regular expressions, and that was fixed in 5.8.1.
: 
: I happen to have 5.8.0 on machines at work, so I'd be interested to
: know that that bug is.


I can't answer for Markus's bug, but the main problem with 5.8.0 is
that there was a cultural bug in that Perl 5.8.0 paid attention to
whether you were in a UTF-8 locale, and magically made all inputs
Unicodified on the assumption that you wouldn't choose to use a
Unicode locale unless you knew what you were doing.  Unfortunately,
that didn't turn out to be the case, after RedHat turned on UTF-8
locales for everyone willy nilly.  So 5.8.1 backed off on that, with
the result that you have to be a little more intentional about your
input formats (or set the PERL_UNICODE environment variable).

: > I had a lot of Perl 5.0 script that processed UTF-8 before there was any
: > UTF-8 support in Perl. They continue to work with "use byte;" added, but
: > they got significantly simpler by using Perls Unicode facilities.
: 
: I've never found "use byte;" or "no utf8;" helps much: my old scripts
: containing regular expressions with UTF-8 characters in them still
: break with newer versions of perl. The thing I end up adding to my old
: scripts is "binmode(STDIN, ':crlf);".

Which looks like a workaround to the 5.8.0 problem to me.

: Question: How can I ensure the files opened by "while (<>) { ... }"
: have binmode applied to them?

In theory "use open" should do that.  Upgrading from 5.8.0 might be
the best solution however.

Larry

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl & unicode weirdness.

Reply via email to