(To avoid confusion, we don't call our encoding UTF-8. We tend to
say UTF-8 when we mean UTF-8, and "utf8" when we mean the more general
not-necessarily-Unicode encoding.
This is an insane way to make a distinction, just as silly as trying to differentiate between "kilobits" and "kilobytes" with "kb" and "kB". Changing hyphens and case doesn't make distinctions or avoid confusion.
I think he meant that the perl utf-8 implementation wasnt excessively restrictive, not so much that it contained a unique or incompatible encoding.
I personally think filtering the code-point range is a separate concern from encoding itself. I dont think you would want a utf-32 input stream to start dropping words just because they exceed 0x10FFFF.
So, imho, wrt to the terms "UTF-8", and "utf8", there is no difference in "encoding", and hence no confusion.
I think because most programmers and existing code tend to expect binary i/o, it is a practical(It's a shame that Perl doesn't behave like everyone else and obey locale settings correctly; I thought we were finally getting away from having to tell each program individually to use UTF-8. I don't understand the logic of "RedHat set the locale to UTF-8 prematurely, so Perl shouldn't obey the locale".)
setting.
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
