On Sunday, 03. March 2002 21:21, Tod Harter wrote:

> encode the resulting paths. Realistically even UTF-8 is a hack. ALL the
> software and standards need to be updated, badly. Ideally all software
> should be able to deal with any incoming encoding, and really everything
> should be UTF-16 internally. At least then you have a fighting chance of
> representing an encoding in a consistent internal form. I'd give it about
> 40 years...

So UTF-8 is a hack, but UTF-16 not - what makes you think so? Perhaps you are 
uncomfortable with the idea of characters with differing byte length? Then I 
must disappoint you, UTF-16 has surrogate pairs for characters beyond plane 0 
(the first 65k), and there are already alphabets located in plane 1 (a 'Lord 
of the Rings' book would use it :-).
Actually, UTF-16 is _not_ able to encode the whole 32-bit code space, while 
UTF-8 is. (The practical value of this is rather doubtful, though, as there 
are proposals to clip the 32-bit code space to 21 bit as it ought to be 
enough for every char. And UTF-16 does these 21 bits.) 
So you should rather use UCS-4 for a true one-to-one length-to-character 
mapping. But wait, then there are composite chars, modifiers, and the famous 
BOM (zero width space), which doesn't count, or at least not really. You'll 
never get your good old bytes=chars behaviour back.

And at the end, it is absolutely irrelevant how the stuff is stored 
internally. For transfer, UTF-8 is a great thing, a lot more comprehensible 
to the casual eye than UTF-16.

-- 
CU
        Joerg

PGP Public Key at http://ich.bin.kein.hoschi.de/~trouble/public_key.asc
PGP Key fingerprint = D34F 57C4 99D8 8F16 E16E  7779 CDDC 41A4 4C48 6F94


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to