Re: strcasecmp() comparing punctuation in ASCII?

Kirk Wolf Fri, 02 Jun 2017 09:04:58 -0700

"%" ahead of "*" would be consistent with the collation sequence defined by
the "POSIX C" locale:


https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcpx01/cloc.htm

and here it says:
"The POSIX C locale uses the ASCII collation sequence; the first 128 ASCII
characters are defined in the collation sequence, and the remaining EBCDIC
characters are at the end of the collating sequence. "
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcpx01/clocdif.htm#clocdif

If this is what you are seeing, then strcasecmp() is collating based on the
current LC_COLLATE / locale.   That's not what the standard says, assuming
that IBM's "POSIX C" locale is the "POSIX" locale referenced by the
standard.   Or IBM interpreted the standard to mean that POSIX on an EBCDIC
machine should sort according to ASCII byte order, which make sense
although breaks the letter of the law.

Kirk Wolf
Dovetailed Technologies
http://dovetail.com

On Fri, Jun 2, 2017 at 10:32 AM, Charles Mills <[email protected]> wrote:

> I had never paid any attention to what the locale was. I called
> setlocale() (yes, you use setlocale() to get the locale!) and the answer
> was "C". And yes, we are POSIX(ON).
>
> I am not seeing the behavior you ascribe to the standard. In the original
> issue I detected, "%" strcasecmp()'ed ahead of "*". Those characters should
> be unaffected by tolower(), and x'6C' (%) would certainly strcmp() after
> x'5C' (*). (My assumption at the time was that I had a bug and was not
> sorting the table as I intended.)
>
> Charles
>
>
> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of Kirk Wolf
> Sent: Friday, June 2, 2017 7:06 AM
> To: [email protected]
> Subject: Re: strcasecmp() comparing punctuation in ASCII?
>
> The IEEE Std 1003.1, 2004 Edition was mentioned, but please note that it
> says this:
>
> "In the POSIX locale, *strcasecmp*() and *strncasecmp*() shall behave as
> if the strings had been converted to lowercase and then a byte comparison
> performed. *The results are unspecified in other locales.*"
>
> This is interesting in that it points to the next question:  what locale
> are you running under?
>
> The strcasecmp() function in XLC/C++ is poorly documented.  It only says:
>      "The strcasecmp() function is locale-sensitive".
>
> In z/OS XLC/C++, the default if you are running POSIX(ON) is the "POSIX C"
> locale.
> If this is the case for you, then the above statement from the standard
> would mean that:
>
>     strcasecmp(a,b)  ==  strcmp(tolower(a), tolower(b))
>          # assuming a tolower(char*) function based on tolower(char)
>
> But you aren't seeing this.
>
> So, either:
>
> a) you aren't running with the POSIX C locale  (where the collation of
> strcasecmp() is undefined by the standard).
>
> b) you are running with the POSIX C locale, but IBM didn't follow the
> standard.
>
> According to the XLC/C++ doc:
> " The POSIX C locale uses the ASCII collation sequence; the first 128
> ASCII characters are defined in the collation sequence, and the remaining
> EBCDIC characters are at the end of the collating sequence."
>
> Is this what you are seeing?  If so, then XLC/C++ strcasecmp() uses
> LC_COLLATE for the POSIX C locale (and not byte comparison as specified by
> the standard).   Or maybe locale "POSIX C" != "POSIX".  Who knows.
>
> Note:  if you are using the uppercase of a word/phrase as a key, you might
> consider saving the uppercase/lowercase key and then using strcmp() or
> strcoll() to compare.    Or you could define your own collation sequence
> via a translate table and then use the translated string as the key with
> strcmp().
> Using strcmp() for things like sort will probably be much faster anyway
> since it will be inlined using the CLST instruction, wherease strcasecmp()
> will be a function call.
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: strcasecmp() comparing punctuation in ASCII?

Reply via email to