Fri, 29 Sep 2000 10:14:05 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]> pisze:
> The question is how do you *know* which range, eg, the alphanumeric
> characters in a given unicode encoding have? This is certainly
> different in Dutch and Japanese.
This is not different in practice.
Even if it was different, types of Haskell's character predicates
require that they are constant.
In C the situation is different because the meaning of both char and
wchar_t may depend on the current locale, where in Haskell Char is
always Unicode.
My implementation uses a static table of official Unicode character
categories, and predicates test the category, sometimes with
exceptions. Details are being discussed with people on other mailing
lists - the mapping between categories and predicates is not obvious.
The situation is worse with toUpper/toLower, where not only it
may depend on the locale (with the most known case of Turkish),
but needs not to map one character to one. That's why in Unicode
toUpper/toLower mapping is informative, even though isUpper/isLower
is normative. Haskell's toUpper/toLower must be stateless and
Char->Char. In future there will probably be a stateful locale
framework in Haskell, with e.g. locale-dependent string comparison
and correct String->String case mapping.
--
__("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZASTĘPCZA
QRCZAK