Re: Handling Plane 14 language tags in UTF-8 xterm

Robert Brady Thu, 08 Feb 2001 09:25:34 -0800
On Thu, 8 Feb 2001, Markus Kuhn wrote:

> Implementation proposal (surely incomplete, so check carefully):
> 
>   - Xterm should be modified to parse Plane 14 language tags and preserve
>     the last seen complete language tag as terminal state information.
>     No glyphs will be written for any Plane 14 tag characters (U-000E0000
>     .. U-000E007F). wcwidth() of all Plane 14 tag characters will be set
>     to 0 in glibc, but xterm will not treat Plane 14 characters like other
>     non-spacing characters.

Agreed. We can do the first part of this - ignoring such characters in
xterm with virtually no hassle at all.

>   - Xterm should keep a cache of the last 16 seen language tags and assign
>     them cache slot numbers.
> 
>   - The initial language tag is "" and it is permanently assigned to
>     slot number 0.
> 
>   - Xterm should associate the cache slot number (4-bit) of the last seen
>     complete language tag with every character that is newly written into the
>     cell matrix.

That sounds good.  This doesn't expand it very much (well, not compared to
the bloat I added :), and we get to keep 4 bits left over for assorted
other purposes. (Hm, italics?)

>   - If a language tag has to be removed from the cache, then all characters
>     in the charcell matrix associated with that slot get assigned to slot 0
>     before the language tag slot is reused.

Do we want some smarts for this?

  E.g assuming a 4-language limit

        [fr]internationalisation
     [en-GB]internationalisation
     [en-US]internationalization

Then, upon recieving [it], it would unify [en-GB] and [en-US] under an
[en] tag, leaving space for

        [it]internazionalizzazione

Similarly, the distinction between [ja] and [zh] and [ko] would be a
higher priority to maintain that the difference between [da], [de], [en],
[es], [fr], [it], and [pt].

Also, preferring tags with lots of text to tags with only a character or
two.

Probably this is just way too overcomplicated, though. With as many as 16
tags this is not going to happen too often.

>   - Xterm should associates with every filled language tag slot a set of
>     opened fonts (normal, bold, wide, etc.). If the language tag (RFC 1766)
>     has the form xx-yy or just xx, then xterm will attempt to open the
>     current fonts with the ADD_STYLE_NAME values
> 
>        "xx_YY"
>        "xx"
>        "xx_*"
>        ""
>        "*"
> 
>     in that order to find fonts for that slot (after checking whether the
>     same font hasn't been opened before for an other slot). Note: For
>     performance reasons, it is important that fonts with identical XLFD are
>     never opened twice, even if they are associated with different language
>     tag slots.
> 
>   - ESC sequences that reposition the cursor always set the current language
>     tag to 0. LF, CR, FF, BS, VT, HT, etc. will not change the current
>     language tag.

Do we want a facility to set the default language to something other than
""?

>   - xlib will be modified to support the addition of locale dependent
>     language tags in a CTEXT -> UTF8_STRING conversion.
> 
>   - xlib will be modified to take into account the information from
>     language tags in a UTF8_STRING -> CTEXT conversion.
> 
>   - xterm will only support UTF8_STRING selections in UTF-8 mode. Anything
>     else will be converted by Xlib.

Hm, will language tags appear in exported UTF8_STRINGs too, and if so how
will this operate with things that aren't expecting them?

>   - the non-8-bit mode of Xterm will remain completely free from any
>     ISO 2022 related code. Any ISO 2022 related processing for communication
>     with legacy applications will be handled completely in the C or X11
>     library.


-- 
Robert Brady
[EMAIL PROTECTED]




-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Handling Plane 14 language tags in UTF-8 xterm

Reply via email to