The Unicode Consortium has just published Unicode 3.1 on
http://www.unicode.org/unicode/reports/tr27/
- The definition of UTF-8 has been changed such that decoders must not
accept overlong sequences (something we had asked for a long time)
- lots of historic ideographs have been added in Plane 02
- several historic scripts and style variants of the Latin and Greek
alphabet for use as mathematical symbols have been added to Plane 01
along with characters for typesetting of music textbooks
- language tags have been added to Plane 0E
- two characters were added to the BMP
- various clarifications in the text have been made
Recommended activities:
- Regenerate all your tables derived from Unicode databases and check
whether the mechanics can handle the UTF-32 repertoire
- Check your UTF-8 decoders for the changed conformance requirements
- Check that your software (output methods, etc.) properly ignores
Plane 0E language tags unless it interprets them, as required by
the new version of the standard
To be honest, I don't think that support for non-BMP characters in
terminal emulators is a particularly urgent issue, as the non-BMP
characters are unlikely to be of any real use to the vast majority of
terminal emulator users. It might more make sense to provide options to
map the mathematical style variants to their normal equivalents for
display, with proper text attributes (bold and italics) activated. I
don't like the idea that for instance xterm should by default open three
fonts for all three occupied planes now since they have been occupied.
In case of doubt, I think performance should be given priority over
support for non-BMP planes.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/