I am considering to change my wcwidth() definition on

  http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

to cause the following 11 characters to lead to wcwidth() == 0, in order
to accommodate the handling of these Unicode layout control characters
by using the same mechanisms that are already used for handling
combining characters in terminal emulator applications:

  U+200B  ZERO WIDTH SPACE
  U+200C  ZERO WIDTH NON-JOINER
  U+200D  ZERO WIDTH JOINER
  U+200E  LEFT-TO-RIGHT MARK
  U+200F  RIGHT-TO-LEFT MARK
  U+202A  LEFT-TO-RIGHT EMBEDDING
  U+202B  RIGHT-TO-LEFT EMBEDDING
  U+202C  POP DIRECTIONAL FORMATTING
  U+202D  LEFT-TO-RIGHT OVERRIDE
  U+202E  RIGHT-TO-LEFT OVERRIDE
  U+FEFF  ZERO WIDTH NO-BREAK SPACE

In the -misc-fixed-* fonts, all these characters would then be
represented as an empty space glyph, such that they remain invisible if
treated like an overstriking combining character.

Read section 13.2 of The Unicode Standard 3.0 for the semantics and for
application examples of these characters.

Comments and opinions?

Is it commonly considered to be useful if xterm and friends do not
advance the cursor if one of these characters is received, but preserve
them in memory for later use in selections?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to