Handling Plane 14 language tags in UTF-8 xterm

Markus Kuhn Thu, 08 Feb 2001 08:14:14 -0800
Implementation proposal (surely incomplete, so check carefully):

  - Xterm should be modified to parse Plane 14 language tags and preserve
    the last seen complete language tag as terminal state information.
    No glyphs will be written for any Plane 14 tag characters (U-000E0000
    .. U-000E007F). wcwidth() of all Plane 14 tag characters will be set
    to 0 in glibc, but xterm will not treat Plane 14 characters like other
    non-spacing characters.

  - Xterm should keep a cache of the last 16 seen language tags and assign
    them cache slot numbers.

  - The initial language tag is "" and it is permanently assigned to
    slot number 0.

  - Xterm should associate the cache slot number (4-bit) of the last seen
    complete language tag with every character that is newly written into the
    cell matrix.

  - If a language tag has to be removed from the cache, then all characters
    in the charcell matrix associated with that slot get assigned to slot 0
    before the language tag slot is reused.

  - Xterm should associates with every filled language tag slot a set of
    opened fonts (normal, bold, wide, etc.). If the language tag (RFC 1766)
    has the form xx-yy or just xx, then xterm will attempt to open the
    current fonts with the ADD_STYLE_NAME values

       "xx_YY"
       "xx"
       "xx_*"
       ""
       "*"

    in that order to find fonts for that slot (after checking whether the
    same font hasn't been opened before for an other slot). Note: For
    performance reasons, it is important that fonts with identical XLFD are
    never opened twice, even if they are associated with different language
    tag slots.

  - ESC sequences that reposition the cursor always set the current language
    tag to 0. LF, CR, FF, BS, VT, HT, etc. will not change the current
    language tag.

  - xlib will be modified to support the addition of locale dependent
    language tags in a CTEXT -> UTF8_STRING conversion.

  - xlib will be modified to take into account the information from
    language tags in a UTF8_STRING -> CTEXT conversion.

  - xterm will only support UTF8_STRING selections in UTF-8 mode. Anything
    else will be converted by Xlib.

  - the non-8-bit mode of Xterm will remain completely free from any
    ISO 2022 related code. Any ISO 2022 related processing for communication
    with legacy applications will be handled completely in the C or X11
    library.

http://www.unicode.org/unicode/reports/tr7/

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Handling Plane 14 language tags in UTF-8 xterm

Reply via email to