"Michel Fortin" <[email protected]> wrote in message news:[email protected]... > On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <[email protected]> said: > >> Not carrying any other data means not caching the lowercase version, >> which >> means recreating the lowercase version more than necessary. So it's the >> classic speed vs. space tradeoff. I would think there would be cases >> where >> they get compared enough for that to make a difference, although I >> suppose >> we'd really need benchmarks to see. OTOH, there are certainly cases (such >> as >> my original motivating case) where the extra space is not an issue at >> all. > > Comparing the lowercase version of two strings works well for ASCII, but I > doubt it works very well for Unicode. Case conversion is not bidirectional > (for instance both 'SS' and 'ß' become 'ss' in lowercase in German), and > what's equal and what is not sometime depends on the language. > > Checking for string equality is a special case of the Unicode collation > algorithm. I'm not sure if implementing this part of Unicode is in the > scope of Phobos (probably not), but short of having Unicode support it > seems the utility of having a special string type dedicated to ASCII > case-insensitive strings is quite limited. >
Yea, Phobos doesn't even have folding-case functions yet (which is why I keep saying "lowercase"). (This is actually one place where Phobos is still behind Tango.) However, I really think that's orthogonal to this since std.string.icmp doesn't handle such non-english issues either (just the english a-z, A-Z, and that's it). When Phobos does become multilingual, then this can be updated to follow suit. One question though: Aren't 'SS' and 'ß' considered the same in german anyway? If so, how does using lowercase instead of folding case cause a problem?
