The example was the other way around. Changing SS to ß is not a valid
transform, but the other way is. There are also transforms on the combined AE
characters, etc.
That Turkish ‘I’ problem is the only case I know of where the collation
actually changes behavior within the usual western alphabet of ASCII characters.
Bert
From: Mikhail T. [mailto:[email protected]]
Sent: woensdag 25 november 2015 23:19
To: [email protected]
Subject: Re: apr_token_* conclusions (was: Better casecmpstr[n]?)
On 25.11.2015 14:10, Mikhail T. wrote:
Two variables, LC_CTYPE and LC_COLLATE control this text processing behavior.
The above is the correct lower case transliteration for Turkish. In German,
the upper case correspondence of sharp-S ß is 'SS', but multi-char translation
is not provided by the simple tolower/toupper functions.
So, the concern is, some hypothetical header, such as X-ASSIGN-TO may, after
going through the locale-aware strtolower() unexpectedly become x-aßign-to?
I just tested the above on both FreeBSD and Linux, and the results are
encouraging:
% echo STRASSE | env LANG=de_DE.ISO8859 tr '[[:upper:]]' '[[:lower:]]'
strasse
Thus, I contend, using C-library will not cause invalid results, and the only
reason to have Apache's own implementation is performance, but not correctness.
-mi