Followup to:  <[EMAIL PROTECTED]>
By author:    Markus Kuhn <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
> 
> I'd ignore that part. There are now two different UTFs which are both
> called UTF-8, the one in Unicode (up to 4-byte sequences) and the one in
> UCS (up to 6-byte sequences). They are upwards compatible, and I don't
> think any harm will be done by implementing the more comprehensive one.
> 
> With changed conformance requirements, I meant the fact that conforming
> UTF-8 decoders must not accept overlong representations of characters
> for which a shorter UTF-8 sequence would be possible. This was
> explicitely allowed in Unicode 3.0 and is now explicitely forbidden in 3.1.
> 

The UTF-8 as written up by the Unicode people mostly seems to be a
codification of the "no characters above U+10FFFF rule."  Personally,
I think UTF-16 is a huge disaster and this is just One More Reason
Why.  It's sad to see it catered to, but I'm guessing I know the
reason (and it has nine letters one of which is frequently written $).

        -hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to