The Unicode Consortium has just published Unicode 3.1 on

  http://www.unicode.org/unicode/reports/tr27/

- The definition of UTF-8 has been changed such that decoders must not
  accept overlong sequences (something we had asked for a long time)

- lots of historic ideographs have been added in Plane 02

- several historic scripts and style variants of the Latin and Greek
  alphabet for use as mathematical symbols have been added to Plane 01
  along with characters for typesetting of music textbooks

- language tags have been added to Plane 0E

- two characters were added to the BMP

- various clarifications in the text have been made

Recommended activities:

- Regenerate all your tables derived from Unicode databases and check
  whether the mechanics can handle the UTF-32 repertoire

- Check your UTF-8 decoders for the changed conformance requirements

- Check that your software (output methods, etc.) properly ignores
  Plane 0E language tags unless it interprets them, as required by
  the new version of the standard

To be honest, I don't think that support for non-BMP characters in
terminal emulators is a particularly urgent issue, as the non-BMP
characters are unlikely to be of any real use to the vast majority of
terminal emulator users. It might more make sense to provide options to
map the mathematical style variants to their normal equivalents for
display, with proper text attributes (bold and italics) activated. I
don't like the idea that for instance xterm should by default open three
fonts for all three occupied planes now since they have been occupied.
In case of doubt, I think performance should be given priority over
support for non-BMP planes.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to