(To avoid confusion, we don't call our encoding UTF-8. We tend to
say UTF-8 when we mean UTF-8, and "utf8" when we mean the more general
not-necessarily-Unicode encoding.



This is an insane way to make a distinction, just as silly as trying to differentiate between "kilobits" and "kilobytes" with "kb" and "kB". Changing hyphens and case doesn't make distinctions or avoid confusion.

I think he meant that the perl utf-8 implementation wasnt excessively
restrictive, not so much that it contained a unique or incompatible
encoding.

I personally think filtering the code-point range is a separate concern
from encoding itself. I dont think you would want a utf-32 input stream
to start dropping words just because they exceed 0x10FFFF.

So, imho, wrt to the terms "UTF-8", and "utf8", there is no difference in
"encoding", and hence no confusion.

(It's a shame that Perl doesn't behave like everyone else and obey
locale settings correctly; I thought we were finally getting away
from having to tell each program individually to use UTF-8.  I don't
understand the logic of "RedHat set the locale to UTF-8 prematurely,
so Perl shouldn't obey the locale".)

I think because most programmers and existing code tend to expect binary i/o, it is a practical
setting.







-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/



Reply via email to