Re: strcasecmp() comparing punctuation in ASCII?

Charles Mills Thu, 01 Jun 2017 15:04:35 -0700

Thanks. I pretty much get all of your first paragraph. I just would have 
expected on MVS that the letters in the "C" (default) locale would be pretty 
much the same as the order of *EBCDIC* characters when looked at as plain 8 
byte unsigned integers.


It's one of those things: std::sort and std::lower_bound (both with the same 
strcasecmp()-based less-than function) are working just fine. They are 
consistent. My application works flawlessly. I never suspected any issue. So I 
was just stunned to find that under the covers it is "working" more or less in 
ASCII rather than EBCDIC. 

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of John McKown
Sent: Thursday, June 1, 2017 1:08 PM
To: [email protected]
Subject: Re: strcasecmp() comparing punctuation in ASCII?

On Thu, Jun 1, 2017 at 2:43 PM, Charles Mills <[email protected]> wrote:

> It's clearly doing everything in ASCII:
>
> strcasecmp("Z", "0") 122
>
> It's interesting. I use the same compare function for both a sort and 
> for a binary search, so it all works correctly -- it's just not 
> working the way I think it is.
>
> Charles
>
>
I'm not any kind of an expert on this. So take everything I say with about a 
kilo of NaCl. As the pages you referenced states, the strcasecmp() function is 
locale sensitive. The locale ordering is NOT based on the code point (hex 
value) at all (well at least conceptually). It is based on the "rune". Where 
"rune" is basically the concept of what character this is, such as used in 
UNICODE (e.g. LATIN-SMALL-LETTER-A is 'a' regardless of the hex value(s) used 
to store that in memory). For something "simple" you can sort of think of the 
hex value as being an index into an array of values, where the value at that 
index value is the relative collating position of the "rune" involved in the 
comparison. This is how strcasecmp("A","a") is "equal". The relative collating 
position of "A" and "a" are the same, so the comparison is "equal".

Of course, it looks like it is an ASCII compare because the relative 
positioning of the of the letters in the "C" (default) locale is pretty much 
the same as the order of "ASCII" characters when looked at as plain 8 byte 
unsigned integers.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: strcasecmp() comparing punctuation in ASCII?

Reply via email to