"Daniel Gibson" <[email protected]> wrote in message news:[email protected]... > Am 10.01.2011 22:16, schrieb Michel Fortin: >> >> Comparing the lowercase version of two strings works well for ASCII, but >> I doubt >> it works very well for Unicode. Case conversion is not bidirectional (for >> instance both 'SS' and 'ß' become 'ss' in lowercase in German), > > That's wrong, 'ß' is lowercase and no upper-case version is used really, > though one exists in Unicode (see: > http://en.wikipedia.org/wiki/Capital_%C3%9F ). > Sometimes, when stuff is written in fullcaps, 'ß' (which never is the > first character of a word) is replaced by "SS", but I wouldn't expect that > to be equal on icmp(). (e.g. "Strings vergleichen macht keinen Spaß!" vs > "STRINGS VERGLEICHEN MACHT KEINEN SPASS!") > > Anyway, in this case comparing in lowercase would cause no trouble at all > (comparing in uppercase however would, if you don't use the > not-really-existing-but-defined-by-unicode-Capital-ß). > > I don't know if there may be problems with special characters in other > languages, though. >
One of the unicode documents mentions an example involving the three greek "sigma" letters, although I never quite understood how it demonstrated the inadequacy of using lower-case: http://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching ...Which references some information near the end of this sub-section: http://www.unicode.org/reports/tr21/tr21-5.html#Introduction Actually, what probably should be stored is a *normalized* folding-case version of the string, because then (if I understand correctly) memcmp could be used. I don't think memcpy technically works on non-ASCII (unless it's in normalized form). In any case, Phobos doesn't currently handle any of that stuff at all, so my case-insensitive string type wouldn't be taking things backwards in that regard.
