Re: Perl & unicode weirdness.

jmaiorana Mon, 02 Feb 2004 14:59:41 -0800

(To avoid confusion, we don't call our encoding UTF-8. We tend to say UTF-8 when we mean UTF-8, and "utf8" when we mean the more general not-necessarily-Unicode encoding.
This is an insane way to make a distinction, just as silly as trying to
differentiate between "kilobits" and "kilobytes" with "kb" and "kB".
Changing hyphens and case doesn't make distinctions or avoid confusion.

I think he meant that the perl utf-8 implementation wasnt excessively
restrictive, not so much that it contained a unique or incompatible
encoding.

I personally think filtering the code-point range is a separate concern
from encoding itself. I dont think you would want a utf-32 input stream
to start dropping words just because they exceed 0x10FFFF.

So, imho, wrt to the terms "UTF-8", and "utf8", there is no difference in
"encoding", and hence no confusion.

(It's a shame that Perl doesn't behave like everyone else and obey
locale settings correctly; I thought we were finally getting away
from having to tell each program individually to use UTF-8.  I don't
understand the logic of "RedHat set the locale to UTF-8 prematurely,
so Perl shouldn't obey the locale".)

I think because most programmers and existing code tend to expect binary i/o, it is a practical setting.


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl & unicode weirdness.

Reply via email to