On 1 June 2017 at 18:23, Thomas David Rivers <[email protected]> wrote: > I find that very odd... > > strcasecmp() is obliged to convert all upper-case letters into lower-case > for the comparison.
IEEE Std 1003.1-2008, 2016 Edition says: "When the LC_CTYPE category of the locale being used is from the POSIX locale, these functions shall behave as if the strings had been converted to lowercase and then a byte comparison performed. Otherwise, the results are unspecified." The whole thing is more than a bit ugly. Specifying a byte-for-byte comparison for character strings is just so wrong, certainly in 2017. John McK. has the right idea above, I think, but string comparison goes much further than that. To provide a "culturally correct" ordering of strings requires that sort keys, typically in four parts, be generated for each string, and then those keys compared. There is no byte-for-byte comparison (even with a single byte character encoding) that can produce results ordered the way a dictionary or telephone book would have them, even in English. The best introduction I know to this topic is the 1990 IBM Redbook GG24-3516-00 Keys to Sort and Search for Culturally Expected Results. I don't believe this book was ever available in machine readable form, and my copy is perfect-bound, and hard to scan. But maybe it's time to saw it up and get it to bitsavers as we approach 30 years of how to do this stuff being widely understood. Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
