Re: Towards handling CJK style variants in UTF-8 xterm

Markus Kuhn Fri, 09 Feb 2001 00:40:20 -0800
Tomohiro KUBOTA wrote on 2001-02-09 04:54 UTC:
> - Some on-demand-font-loading mechanism would be needed.

Agreed.

> - in "locale-encoding" mode, it is obvious that the language which
>   is specified by the locale is the only language.  Thus, in "locale-
>   encoding" mode, "default language" should not be "" but the language
>   specified by the locale.  Well, as Roozbeh said, locale language
>   can be the default not only in "locale-encoding" mode but also
>   in UTF-8 mode.

Eventually, there will only be the locale-encoding mode, because
eventually I hope that all target operating systems will have UTF-8
locales and have set __STDC_ISO_10646__. Having UTF-8 encoders/decoders
inside xterm is just a temporary solution until the various C libraries
are mature enough.

Let me update my proposal:

    If the language tag (RFC 1766) has the form uu-vv or just uu, and the
    locale has the form xx(_YY)?([\.@].*) then xterm will attempt to open the
    current fonts with the ADD_STYLE_NAME values

       "uu_VV"
       "uu"
       "uu_*"
       "xx_YY"
       "xx"
       "xx_*"
       ""
       "*"

    in this order (if any of the used components uu, vv, xx, yy is not
    available, the ADD_STYLE_NAME patterns in which they appear will
    not be used).

> - (very optional) Since UTF-8 with language tag can guarantee
>   ISO-2022 round-trip compatibility, someone can develop a wrapper
>   for UTF-8 XTerm that receive input as ISO-2022, convert it into
>   UTF-8, and give it to XTerm. 

It would be nice to have this in glibc and I think it is feasible, even
though this adds *significantly* to the amount of state that mbstate_t
has to store, as it has not only to keep incomplete UTF-8 sequences, but
also incomplete ISO 2022 ESC sequences and incomplete Plane 14 language
tags that have not yet been read or written completely.

Remains to be seen what amount of bloat people will tolerate in mbstate_t.

> - Specifying "typeface", i.e., Roman or Italic, is not a feature
>   of plain text but a feature of rich text.  No, I don't oppose
>   this idea.  I just want to point out that implementation of this
>   should have lower priority than language tag and so on.

We are talking about terminal emulator semantics here primarily. Italics
support (ESC [ 3 m I think) belongs into a terminal emulator just as
bold and inverse do, there is no question about that. XFree86 has now
italic versions for the most commonly used xterm fonts and emacs is a
very popular software that makes intensive use of italic in various
modes. So there are definitely very good reason for adding italics
support to xterm. It is an orthogonal issue to language tagging support,
but I mentioned it here because if someone digs up the font loading and
character attribute parts of xterm anyway, it is a good time to include
italics support at the same time.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Towards handling CJK style variants in UTF-8 xterm

Reply via email to