On Mon, Mar 26, 2007 at 05:28:43PM -0400, SrinTuar wrote:
> I frequenty run into problems with utf-8 in perl, and I was wondering
> if anyone else
> had encountered similar things.
>
> One thing I've noticed is that when processing characters, I often get
> warnings about
> "wide characters in print", or have input/output get horribly mangled.
>
> Ive been trying to work around it in various ways, commonly doing thing
> such as:
> binmode STDIN,":utf8";
> binmode STDOUT,":utf8";
>
> or using functions such as :
> sub unfunge_string
> {
> foreach my $ref (@_)
> {
> $$ref = Encode::decode("utf8",$$ref,Encode::FB_CROAK);
> }
> }
>
>
> but this feels wrong to me.
>
> For a language that really goes out of its way to support encodings, I
> wonder if it
> wouldnt have been better off it it just ignored the entire concept
> alltogether and treated
> strings as arrays of bytes...
Read the ancient linux-utf8 list archives and you should get a good
feel for Larry Wall's views on the matter.
> Ive found pages wherin people complain of similar problems, such as:
> http://ahinea.com/en/tech/perl-unicode-struggle.html
>
> And I'm wondering if in its attempt to be a good i18n citizen, perl
> hasnt gone overboard and made a mess of things instead.
I agree, but maybe there are workarounds. I have a system that's
completely UTF-8-only. I don't have or want support for any legacy
encodings except in a few isolated tools (certainly nothing related to
perl) for converting legacy data I receive from outside.
With that in mind, I built perl without PerlIO, wanting to take
advantage of my much smaller and faster stdio implementation. But now,
binmode doesn't work, so the only way I can get rid of the nasty
warning is by disabling it explicitly.
Is there any way to get perl to behave sanely in this regard? I don't
really use perl much (mainly for irssi) so if not, I guess I'll just
leave it how it is and hope nothing seriously breaks..
Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/