Am 10.01.2011 22:16, schrieb Michel Fortin:
On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <[email protected]> said:
Not carrying any other data means not caching the lowercase version, which
means recreating the lowercase version more than necessary. So it's the
classic speed vs. space tradeoff. I would think there would be cases where
they get compared enough for that to make a difference, although I suppose
we'd really need benchmarks to see. OTOH, there are certainly cases (such as
my original motivating case) where the extra space is not an issue at all.
Comparing the lowercase version of two strings works well for ASCII, but I doubt
it works very well for Unicode. Case conversion is not bidirectional (for
instance both 'SS' and 'ß' become 'ss' in lowercase in German),
That's wrong, 'ß' is lowercase and no upper-case version is used really, though
one exists in Unicode (see: http://en.wikipedia.org/wiki/Capital_%C3%9F ).
Sometimes, when stuff is written in fullcaps, 'ß' (which never is the first
character of a word) is replaced by "SS", but I wouldn't expect that to be equal
on icmp(). (e.g. "Strings vergleichen macht keinen Spaß!" vs "STRINGS
VERGLEICHEN MACHT KEINEN SPASS!")
Anyway, in this case comparing in lowercase would cause no trouble at all
(comparing in uppercase however would, if you don't use the
not-really-existing-but-defined-by-unicode-Capital-ß).
I don't know if there may be problems with special characters in other
languages, though.
and what's equal
and what is not sometime depends on the language.
Checking for string equality is a special case of the Unicode collation
algorithm. I'm not sure if implementing this part of Unicode is in the scope of
Phobos (probably not), but short of having Unicode support it seems the utility
of having a special string type dedicated to ASCII case-insensitive strings is
quite limited.