On Mon, 25 Jun 2001, Bruno Haible wrote:
> A language tag is sufficient, because all Japanese charsets behave
> the same w.r.t. rendition of some specific characters. It's kind of a
> national custom.
So double-width Cyrillic would become tagged with ru_JP, such that the
spell checker knows it is supposed to be Russian and the rendering engine
knows that is it supposed to be as doublewidth as kterm would make it?
(Just kidding, may be ... :-)
The problem I see with language tags in that context is just that they
reintroduce the exact state that we wanted to get rid of so badly. The
wchar_t encoding described on
http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html
has the advantage that functions such as wcwidth() still can be
implemented, because all relevant information is located within a single
context-free wchar_t value, not somewhere else in the stream, and they can
even be implemented in a mostly locale-independent way.
Just to avoid missunderstandings, I am personally not the least interested
in actually having an ISO 2022 multi-byte locale. The above proposal is
really just an argument against the people who want that, but
nevertheless, what it proposes makes sense and is practical, if you really
want ISO 2022 round-trip compatibility as badly as some people claim they
want, because they got so accustomed to the MULE way of life. In that
respect, it is actually very comparable to language tags.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/