Mark Leisher wrote on 2000-09-27 08:31 UTC:
> 
> On Mon, 25 Sep 2000, Markus Kuhn wrote:
> 
>     >> The zero-width spaces/joiner are only required for ligature
>     >> output. This is for the forseeable future probably outside the scope of
>     >> VT100-like terminal emulators, and therefore also outside the scope of
>     >> wcwidth().
> 
> I missed this the first time around and must disagree.  We use ZERO WIDTH
> SPACE between "words" in Chinese, Japanese, Thai, Lao, etc.  Not strictly
> necessary for terminal emulators, but handy for "word" selection.  We use ZERO
> WIDTH NON-JOINER to present Persian compound words correctly, and compounds
> are common in Persian.  And the ZERO WIDTH JOINER is very handy for displaying
> contextual forms of letters in the Arabic block that do not have their
> separate forms included in Unicode.
> 
> I would go so far as to claim support for these are necessary.  If not
> immediately, then eventually.

Support for these (if desired in terminal emulator applications) is in
fact trivial to add the following way: I simply have to redefine

  wcwidth(ZERO WIDTH SPACE)      = 0
  wcwidth(ZERO WIDTH NON-JOINER) = 0
  wcwidth(ZERO WIDTH JOINER)     = 0

in glibc and xterm (the latter will eventually also just use the
implementation of the former, such everything comes out of the same
locale specification), and then xterm will treat all the ZERO WIDTH
characters exactly like combining characters with no ink.

They will have to get a space glyph in the fonts, and then the combining
overstriking will leave them invisible. They can be edited with (e.g.,
mined's) editing mechanisms for combining characters and they will (even
though they do not occupy their own character cell) remain fully
preserved during selections as long as the preceding character is
selected. In addition, wcwidth() and wcswidth() will still predict the
cursor motion correctly without any special treatment of zero-width
spaces in either the terminal emulator or the application.

Side note (on a long ago question by Robert): It just occurred to me
that it is very important that terminal emulators do *not* advance the
cursor if they receive a combining character after a control character
(e.g., at the beginning of a line) or as the first character of a
session. The best is probably to drop wcwidth() = 0 characters silently
if they were not immediately preceded by a wcwidth() > 0 character (that
was not part of a control sequence). Reason: Anything else would just
cause a disagreement between what wcswidth() says about a displayed line
length and what really happens on the screen.

Therefore, please do NOT insert automatically some default base
character in the terminal emulator whenever you receive a lonesome
combining character. This just messes up the otherwise clean and neat
correlation between wcswidth() and cursor behaviour.

  Markus' razor: Keep the number of special cases for treating
  Unicode characters with added semantics in terminal emulators
  minimal in the interest of simplicity and robustness.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to