Fri, 29 Sep 2000 10:14:05 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]> pisze:

> The question is how do you *know* which range, eg, the alphanumeric
> characters in a given unicode encoding have?  This is certainly
> different in Dutch and Japanese.

This is not different in practice.

Even if it was different, types of Haskell's character predicates
require that they are constant.

In C the situation is different because the meaning of both char and
wchar_t may depend on the current locale, where in Haskell Char is
always Unicode.

My implementation uses a static table of official Unicode character
categories, and predicates test the category, sometimes with
exceptions. Details are being discussed with people on other mailing
lists - the mapping between categories and predicates is not obvious.

The situation is worse with toUpper/toLower, where not only it
may depend on the locale (with the most known case of Turkish),
but needs not to map one character to one. That's why in Unicode
toUpper/toLower mapping is informative, even though isUpper/isLower
is normative. Haskell's toUpper/toLower must be stateless and
Char->Char. In future there will probably be a stateful locale
framework in Haskell, with e.g. locale-dependent string comparison
and correct String->String case mapping.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK


Reply via email to