Title: RE: non-breaking space


> -----Original Message-----
> From: Markus Kuhn [mailto:[EMAIL PROTECTED]]
...
> The only non-printable characters in Unicode are control characters
> whose general category code in the Unicode database starts
> with C, i.e.

Well, nearly.

1. C stands for "Other", not "Control".  Cc is control, and Cf
is format (control).

2. Some of the private use characters (category Co in the Unicode
database) may well be printable (unless they are device or
format controls of some kind) given a suitable font and other
data needed regarding which (private) assignments have been made.

3. Category Cn (no datalines listed in the Unicode database)
is implicit for unassigned code points that may well become
assigned in the future (except those that are permanently
reserved). Unless the new characters have complex rendering
behaviour (and that is not handled by the font), all that should
be needed is to install and use a font that includes the new
characters of interest.  I.e., there should be no "compiled in"
"knowledge" that the characters that were of category Cn are
"unprintable".  That way old programs can more gracefully
handle newer fonts.

In addition, e.g. zero-width space does not show (or take up
a "cell") either.

> those characters that get printed with
>
> $ egrep '^[^;]*;[^;]*;C' UnicodeData-Latest.txt
>
> http://www.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt
> http://www.unicode.org/Public/UNIDATA/UnicodeData.html
>
> For the soft hyphen (SHY, 173=0xAD), the discussion might be
> a bit more
> tricky (see <http://www.hut.fi/~jkorpela/shy.html> for a good
> discussion), but I would also classify that one as printable as well,
> and so does Unicode.

It's visible in rendering when an auto-linewrap follows it.  It should not
be shown if there is no (auto)line-wrap immediately after it.  And the
character itself should NOT be removed when a paragraph is reflowed.

In addition, e.g. zero-width space does not show (or take up
a "cell") either.

                /Kent K

> (Note that some of the X11 fonts lack character 160, so with these
> broken fonts, NBSP is indeed non-printable, but this font bug will be
> fixed soon. I can only guess, that knowledge this font problem might
> have been the origin of iswprint(160) == 0 in glibc.)
>
> Markus
>
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
>
> -
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/lists/
>

Reply via email to