Florian Weimer wrote on 2000-08-10 09:37 UTC:
> The problem is the C API: You can't query character properties of
> surrogate pairs using isupper() and friends.

Make sure to appreciate that this problem is quite likely purely
academic anyway. Nobody will be seriously interested, whether anything
outside the BMP is uppercase or lowercase. The stuff outside the BMP is
so far away in semantics from normal alphabets with equivalent upper and
lower case pairs that these simple ASCII concepts of the C API fail
there anyway. What is toupper() doing for mathematical variables (speed
"v" has absolutely nothing to do with volume "V" in physics or
mathematics and the two must not be normalized or converted into each
other!), hieroglyphics, Etruscan, Klingon, Tengwar, etc.?

If the <ctype.h> functions (isspace(), etc.) make any sense on a
character, it will most likely find its way into the BMP anyway.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to