Bruno Haible wrote on 2000-09-25 15:07 UTC:
> > isControl c = c < ' ' || c >= '\x7F' && c <= '\x9F'
>
> Here I would add: category is one of [Zl,Zp]
> because the Line/Paragraph Separators behave like LineFeed.
This depends on the environment (locale?).
In xterm and other UCS terminal emulator, LS/PS are currently treated
like any other undefined character: a default-character box is printed.
The Line/Paragraph Separators might perhaps behave like LineFeed inside
some word processors (which ones?). I very much hope that they will not
show up in UTF-8 plain text files on POSIX systems. It would break the
original ASCII compatibility of UTF-8 significantly to introduce an
alternative for LF, with security consequences as severe as the decoding
of overlong UTF-8 sequences. On POSIX applications that parse plain text
files, treating LS/PS just like like any other unassigned characters is
probably the best thing to do. In other words, all the is????()
functions should return 0.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/