I am considering to change my wcwidth() definition on
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
to cause the following 11 characters to lead to wcwidth() == 0, in order
to accommodate the handling of these Unicode layout control characters
by using the same mechanisms that are already used for handling
combining characters in terminal emulator applications:
U+200B ZERO WIDTH SPACE
U+200C ZERO WIDTH NON-JOINER
U+200D ZERO WIDTH JOINER
U+200E LEFT-TO-RIGHT MARK
U+200F RIGHT-TO-LEFT MARK
U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
U+202D LEFT-TO-RIGHT OVERRIDE
U+202E RIGHT-TO-LEFT OVERRIDE
U+FEFF ZERO WIDTH NO-BREAK SPACE
In the -misc-fixed-* fonts, all these characters would then be
represented as an empty space glyph, such that they remain invisible if
treated like an overstriking combining character.
Read section 13.2 of The Unicode Standard 3.0 for the semantics and for
application examples of these characters.
Comments and opinions?
Is it commonly considered to be useful if xterm and friends do not
advance the cursor if one of these characters is received, but preserve
them in memory for later use in selections?
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/