Re: strcasecmp() comparing punctuation in ASCII?

Charles Mills Fri, 02 Jun 2017 08:32:26 -0700

I had never paid any attention to what the locale was. I called setlocale() 
(yes, you use setlocale() to get the locale!) and the answer was "C". And yes, 
we are POSIX(ON).


I am not seeing the behavior you ascribe to the standard. In the original issue 
I detected, "%" strcasecmp()'ed ahead of "*". Those characters should be 
unaffected by tolower(), and x'6C' (%) would certainly strcmp() after x'5C' 
(*). (My assumption at the time was that I had a bug and was not sorting the 
table as I intended.)

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Kirk Wolf
Sent: Friday, June 2, 2017 7:06 AM
To: [email protected]
Subject: Re: strcasecmp() comparing punctuation in ASCII?

The IEEE Std 1003.1, 2004 Edition was mentioned, but please note that it says 
this:

"In the POSIX locale, *strcasecmp*() and *strncasecmp*() shall behave as if the 
strings had been converted to lowercase and then a byte comparison performed. 
*The results are unspecified in other locales.*"

This is interesting in that it points to the next question:  what locale are 
you running under?

The strcasecmp() function in XLC/C++ is poorly documented.  It only says:
     "The strcasecmp() function is locale-sensitive".

In z/OS XLC/C++, the default if you are running POSIX(ON) is the "POSIX C"
locale.
If this is the case for you, then the above statement from the standard would 
mean that:

    strcasecmp(a,b)  ==  strcmp(tolower(a), tolower(b))
         # assuming a tolower(char*) function based on tolower(char)

But you aren't seeing this.

So, either:

a) you aren't running with the POSIX C locale  (where the collation of
strcasecmp() is undefined by the standard).

b) you are running with the POSIX C locale, but IBM didn't follow the standard.

According to the XLC/C++ doc:
" The POSIX C locale uses the ASCII collation sequence; the first 128 ASCII 
characters are defined in the collation sequence, and the remaining EBCDIC 
characters are at the end of the collating sequence."

Is this what you are seeing?  If so, then XLC/C++ strcasecmp() uses LC_COLLATE 
for the POSIX C locale (and not byte comparison as specified by
the standard).   Or maybe locale "POSIX C" != "POSIX".  Who knows.

Note:  if you are using the uppercase of a word/phrase as a key, you might 
consider saving the uppercase/lowercase key and then using strcmp() or
strcoll() to compare.    Or you could define your own collation sequence
via a translate table and then use the translated string as the key with 
strcmp().
Using strcmp() for things like sort will probably be much faster anyway since 
it will be inlined using the CLST instruction, wherease strcasecmp() will be a 
function call.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: strcasecmp() comparing punctuation in ASCII?

Reply via email to