Roozbeh Pournader wrote on 2001-05-24 20:06 UTC:
> On Thu, 24 May 2001, Markus Kuhn wrote:
>
> > Roozbeh Pournader wrote on 2001-05-24 14:58 UTC:
> > > Quoting '/usr/src/linux/drivers/char/consolemap.c' (Lines 647-648):
> > >
> > > else if (ucs == 0xfeff || (ucs >= 0x200a && ucs <= 0x200f))
> > > return -2; /* Zero-width space */
> > >
> > > Does this mean that there's no way we can make the console display
> > > characters for ZWNBSP (also known as BOM), ZWJ, ZWNJ, ..., something like
> > > a show controls mode? We really need your comments.
> >
> > It was probably me who wrote that console code half a decade ago,
> > wondering what to do with all these zero-width characters that might or
> > might not be present in plaintext.
>
> Don't you agree that this portion should be removed? The console displays
> characters that do not have an equivalent glyph in the font as zero-width,
> so the proper way for not showing these should be not assigning any glyph.
What do you mean by "that do not have an equivalent glyph in the font"?
Only seven characters are handled by the above code:
200A # HAIR SPACE
200B # ZERO WIDTH SPACE
200C # ZERO WIDTH NON-JOINER
200D # ZERO WIDTH JOINER
200E # LEFT-TO-RIGHT MARK
200F # RIGHT-TO-LEFT MARK
FEFF # ZERO WIDTH NO-BREAK SPACE
The console *has* the appropriate glyph for all of these available: It
has the same color as the background and doesn't move the cursor.
You could argue that 0x200a should not be zero-width and indeed we have
today wcwidth(0x200a) = 1 and transliterate it into a normal space.
That's more a matter of taste. But the rest seems perfectly OK to me.
All other characters "that do not have an equivalent glyph in the font"
are represented by CP437(0xf3) = U+25A0 (BLACK SQUARE) as the default
character.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/