Re: strcasecmp() comparing punctuation in ASCII?

Tony Harminc Thu, 01 Jun 2017 16:30:34 -0700

On 1 June 2017 at 18:23, Thomas David Rivers <[email protected]> wrote:
> I find that very odd...
>
> strcasecmp() is obliged to convert all upper-case letters into lower-case
> for the comparison.


IEEE Std 1003.1-2008, 2016 Edition says:

"When the LC_CTYPE category of the locale being used is from the POSIX
locale, these functions shall behave as if the strings had been
converted to lowercase and then a byte comparison performed.
Otherwise, the results are unspecified."

The whole thing is more than a bit ugly. Specifying a byte-for-byte
comparison for character strings is just so wrong, certainly in 2017.

John McK. has the right idea above, I think, but string comparison
goes much further than that. To provide a "culturally correct"
ordering of strings requires that sort keys, typically in four parts,
be generated for each string, and then those keys compared. There is
no byte-for-byte comparison (even with a single byte character
encoding) that can produce results ordered the way a dictionary or
telephone book would have them, even in English.

The best introduction I know to this topic is the 1990 IBM Redbook
GG24-3516-00 Keys to Sort and Search for Culturally Expected Results.
I don't believe this book was ever available in machine readable form,
and my copy is perfect-bound, and hard to scan. But maybe it's time to
saw it up and get it to bitsavers as we approach 30 years of how to do
this stuff being widely understood.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: strcasecmp() comparing punctuation in ASCII?

Reply via email to