Bruno Haible wrote on 2001-04-11 13:11 UTC:
> wcwidth is locale dependent, and in the UTF-8 locale, we adhere to UAX
> #11. It doesn't give you the freedom to define the width of "TRIPLE
> INTEGRAL SIGN" arbitrarily.
To be correct, please replace "we adhere to UAX #11" to "we adhere to
our private interpretation of UAX #11". UAX #11 very carefully tries to
avoid to define what we actually want to have defined. It just documents
practice for other coded character sets. Read it carefully. It is more
concerned about the issues of EUC-JP <-> Unicode conversion than about
terminal semantics.
UAX #11 does not prescribe any wcwidth() values. It helps us half the
way by making it clear to us that there can be little dispute that
wcwidth of East Asian Full-width (F) and East Asian Wide (W) characters
must be 2 and that wcwidth of East Asian Half-width (H) and East Asian
Narrow (Na) must be 1. That's the easy part.
The difficult part is to decide what to do with East Asian Ambiguous (A)
characters, that is all characters that can be sometimes wide and
sometimes narrow in different legacy systems, and with Neutral (N)
characters that do not appear in CJK legacy character sets an probably
have never been used on tty style output devices before. The EM DASH and
almost all types of symbols are in this category.
The two conventions that I have defined so far just make Neutral (N)
characters always wcwidth=1 and Ambiguous (A) characters are in one
definition wcwidth=1 and in the other definition wcwidth=2. That's the
easy way of cheating around the real problem. I think these two
conventions are useful and here to stay, but there might me room for a
third one with more careful consideration given to each N and A
character.
We probably can't do any better for the Ambiguous (A) class, we really
need at least two locales here. The question that Florian brought up was
essentially what to do with the Neutral (N) characters. There are indeed
many, that would be only useful even for European users if they were
double-width. What we can discuss (though I recommend against early
implementation is to to consider using double-width characters also
liberally for Neutral (N) characters for which this form factor would
seem typographically appropriate. I'm thinking not only about the EM
DASH and some of the wider math symbols, but actually most of the
graphical symbols, keycaps and dingbats as well.
If we are going to embrace the concept of biwidth tty display anyway,
why not also using it for characters that haven't been used on ttys
before?
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/