>> Turkish lowercase i-with-dot is shorter than the uppercase, and >> uppercase I-without-dot is shorter than the lowercase. > > Thanks! With that, I created a test case and watched it malfunction. > That demonstrated a couple of invalid (in that case) assumptions in the > new code. I suspect that this bug strikes only in relatively few locales.
Yes, my recollection is that it only happens on Turkish and Azeri locales. Some more strangeness with case mappings occurs in Lithuanian locales (an accented i conserves the dot, or something like that!) but I think glibc doesn't implement that. Anyhow, thanks for the work on this longstanding bug! I think this was the last regression that was introduced in 2.6, so this is a major achievement in the 2.x series (perhaps we should have called the 2.6 release 3.0). The next step would be to add support for Unicode character classes, and look into converting other multibyte locales to/from UTF-8 in order to speed up the matches. Paolo
