> -----Original Message-----
> From: Markus Kuhn [mailto:[EMAIL PROTECTED]]
...
> Florian Weimer wrote on 2000-08-10 09:37 UTC:
> > The problem is the C API: You can't query character properties of
> > surrogate pairs using isupper() and friends.
And you cannot query properties of non-ASCII characters using the
isupper() 'and friends' on isolated UTF-8 code units, since only
an ASCII character fit in a single UTF-8 code unit, none of the
other characters fit. Same problem.
> Make sure to appreciate that this problem is quite likely purely
> academic anyway. Nobody will be seriously interested, whether anything
> outside the BMP is uppercase or lowercase. The stuff outside the BMP is
> so far away in semantics from normal alphabets with equivalent upper and
> lower case pairs that these simple ASCII concepts of the C API fail
> there anyway. What is toupper() doing for mathematical variables (speed
> "v" has absolutely nothing to do with volume "V" in physics or
> mathematics and the two must not be normalized or converted into each
> other!), hieroglyphics, Etruscan, Klingon, Tengwar, etc.?
>
> If the <ctype.h> functions (isspace(), etc.) make any sense on a
> character, it will most likely find its way into the BMP anyway.
There is absolutely nothing preventing "off-BMP" characters to come
in case pairs.
Kind regards
/kent k
