Implementation proposal (surely incomplete, so check carefully):
- Xterm should be modified to parse Plane 14 language tags and preserve
the last seen complete language tag as terminal state information.
No glyphs will be written for any Plane 14 tag characters (U-000E0000
.. U-000E007F). wcwidth() of all Plane 14 tag characters will be set
to 0 in glibc, but xterm will not treat Plane 14 characters like other
non-spacing characters.
- Xterm should keep a cache of the last 16 seen language tags and assign
them cache slot numbers.
- The initial language tag is "" and it is permanently assigned to
slot number 0.
- Xterm should associate the cache slot number (4-bit) of the last seen
complete language tag with every character that is newly written into the
cell matrix.
- If a language tag has to be removed from the cache, then all characters
in the charcell matrix associated with that slot get assigned to slot 0
before the language tag slot is reused.
- Xterm should associates with every filled language tag slot a set of
opened fonts (normal, bold, wide, etc.). If the language tag (RFC 1766)
has the form xx-yy or just xx, then xterm will attempt to open the
current fonts with the ADD_STYLE_NAME values
"xx_YY"
"xx"
"xx_*"
""
"*"
in that order to find fonts for that slot (after checking whether the
same font hasn't been opened before for an other slot). Note: For
performance reasons, it is important that fonts with identical XLFD are
never opened twice, even if they are associated with different language
tag slots.
- ESC sequences that reposition the cursor always set the current language
tag to 0. LF, CR, FF, BS, VT, HT, etc. will not change the current
language tag.
- xlib will be modified to support the addition of locale dependent
language tags in a CTEXT -> UTF8_STRING conversion.
- xlib will be modified to take into account the information from
language tags in a UTF8_STRING -> CTEXT conversion.
- xterm will only support UTF8_STRING selections in UTF-8 mode. Anything
else will be converted by Xlib.
- the non-8-bit mode of Xterm will remain completely free from any
ISO 2022 related code. Any ISO 2022 related processing for communication
with legacy applications will be handled completely in the C or X11
library.
http://www.unicode.org/unicode/reports/tr7/
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/