On Thu, 8 Feb 2001, Markus Kuhn wrote:
> Implementation proposal (surely incomplete, so check carefully):
>
> - Xterm should be modified to parse Plane 14 language tags and preserve
> the last seen complete language tag as terminal state information.
> No glyphs will be written for any Plane 14 tag characters (U-000E0000
> .. U-000E007F). wcwidth() of all Plane 14 tag characters will be set
> to 0 in glibc, but xterm will not treat Plane 14 characters like other
> non-spacing characters.
Agreed. We can do the first part of this - ignoring such characters in
xterm with virtually no hassle at all.
> - Xterm should keep a cache of the last 16 seen language tags and assign
> them cache slot numbers.
>
> - The initial language tag is "" and it is permanently assigned to
> slot number 0.
>
> - Xterm should associate the cache slot number (4-bit) of the last seen
> complete language tag with every character that is newly written into the
> cell matrix.
That sounds good. This doesn't expand it very much (well, not compared to
the bloat I added :), and we get to keep 4 bits left over for assorted
other purposes. (Hm, italics?)
> - If a language tag has to be removed from the cache, then all characters
> in the charcell matrix associated with that slot get assigned to slot 0
> before the language tag slot is reused.
Do we want some smarts for this?
E.g assuming a 4-language limit
[fr]internationalisation
[en-GB]internationalisation
[en-US]internationalization
Then, upon recieving [it], it would unify [en-GB] and [en-US] under an
[en] tag, leaving space for
[it]internazionalizzazione
Similarly, the distinction between [ja] and [zh] and [ko] would be a
higher priority to maintain that the difference between [da], [de], [en],
[es], [fr], [it], and [pt].
Also, preferring tags with lots of text to tags with only a character or
two.
Probably this is just way too overcomplicated, though. With as many as 16
tags this is not going to happen too often.
> - Xterm should associates with every filled language tag slot a set of
> opened fonts (normal, bold, wide, etc.). If the language tag (RFC 1766)
> has the form xx-yy or just xx, then xterm will attempt to open the
> current fonts with the ADD_STYLE_NAME values
>
> "xx_YY"
> "xx"
> "xx_*"
> ""
> "*"
>
> in that order to find fonts for that slot (after checking whether the
> same font hasn't been opened before for an other slot). Note: For
> performance reasons, it is important that fonts with identical XLFD are
> never opened twice, even if they are associated with different language
> tag slots.
>
> - ESC sequences that reposition the cursor always set the current language
> tag to 0. LF, CR, FF, BS, VT, HT, etc. will not change the current
> language tag.
Do we want a facility to set the default language to something other than
""?
> - xlib will be modified to support the addition of locale dependent
> language tags in a CTEXT -> UTF8_STRING conversion.
>
> - xlib will be modified to take into account the information from
> language tags in a UTF8_STRING -> CTEXT conversion.
>
> - xterm will only support UTF8_STRING selections in UTF-8 mode. Anything
> else will be converted by Xlib.
Hm, will language tags appear in exported UTF8_STRINGs too, and if so how
will this operate with things that aren't expecting them?
> - the non-8-bit mode of Xterm will remain completely free from any
> ISO 2022 related code. Any ISO 2022 related processing for communication
> with legacy applications will be handled completely in the C or X11
> library.
--
Robert Brady
[EMAIL PROTECTED]
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/