Roozbeh Pournader wrote on 2001-04-04 13:55 UTC:
> > Recommended activities:
> >
> > - Check your UTF-8 decoders for the changed conformance requirements
>
> So we should not accept ISO 10646 five- and six-byte sequences anymore.
> Am I correct?
I'd ignore that part. There are now two different UTFs which are both
called UTF-8, the one in Unicode (up to 4-byte sequences) and the one in
UCS (up to 6-byte sequences). They are upwards compatible, and I don't
think any harm will be done by implementing the more comprehensive one.
With changed conformance requirements, I meant the fact that conforming
UTF-8 decoders must not accept overlong representations of characters
for which a shorter UTF-8 sequence would be possible. This was
explicitely allowed in Unicode 3.0 and is now explicitely forbidden in 3.1.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/