> > Glynn Clements wrote: > >> What Unicode support? > > >> Simply claiming that values of type Char are Unicode characters > >> doesn't make it so.
Well, *claiming* so doesn't make it so. But actually representing characters in such a way that the Unicode conformance rules are followed, makes it so. There is no requirement for a particular API, for instance. > > Just because some implementations lack toUpper etc. doesn't mean > > they all do. toUpper etc. are over-rated. They are very rarely used in real life, or at least should be very rarely used, with very few exceptions: auto-titlecasing of the first word of a sentence (which I find rather handy for natural language texts), and for making "small caps" (some fonts do that internally, but that's a mistake, since it is then not language dependent). Some things that are much more interesting and of practical use are: Unicode normalisation, transformation between encoding forms (mainly for I/O), finding formal character (or rather, code point) properties, line breaking, combining character handling, language dependent collation (UCA based), decimal number parsing and formatting (for several scripts), regular expressions generalised to Unicode (including support for "default ignorable"), ... Case mapping falls rather low on the priority list. Except perhaps for the special form of "case folding" (almost lowercasing but not quite) used for IDNs, but almost only there; but could be used also for Ada, SQL, etc. that "ignore" case. B.t.w., for line breaking Thai, Lao, or Khmer, you need a dictionary. ZERO WIDTH NO BREAK SPACE can be used between words, but isn't normally. > I think the point is that for toUpper etc to be properly Unicoded, > they can't simply look at a single character. IIRC, there are some > characters that expand to two characters when the case is changed, Yes, for instance for ß (sharp s). The uppercase of ß is SS. For proper lowercasing you need a dictionary. It is also language dependent. Case mapping for Lithuanian and Turkish/Azerbaijani have exceptions to what is done elsewhere. See http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt /kent k
<<attachment: winmail.dat>>
_______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell