(I'm trying to move this to IBM-MAIN; it really doesn't belong on ASSEMBLER-LIST. And not trimming quoted material as much as I usually would.)
On 2013-09-04 10:29, Tony Harminc wrote: > On 1 September 2013 00:51, Paul Gilmartin wrote: >> On 2013-08-31, at 08:55, John Gilmore wrote: >>> >>> ... They use data transformations to make it possible >>> for two keys to be compared using a single CLC[L]. (DB2 does similar >>> things too.) >>> >> This can be particularly complex for literary collating conventions >> such as EN_US which DFSORT gets terribly wrong. I tried a PMR on >> this a few years ago. When I reported that DFSORT and a C program >> using strcoll() produce similar incorrect results, DFSORT and I >> agreed that the problem should belong to LE. >> >> LE gave me WAD with a rationale so outrageous that I gave up in >> disgust, making no effort to escalate. > > Isn't it a POSIX violation to produce incorrect collation results for > a locale? Not, I suppose, that that's stopped them before. > > It's a shame because IBM was in the forefront of getting this > collation stuff right, and into the POSIX standards. See the early > Redbook GG24-3516 Keys to Sort and Search for Culturally Expected > Results, and much subsequent work from IBM's long gone National > Language Technical Center. > Thanks for the reference. I'll look for it on publibz. Or might I find it on InfoCenter? The first point of frustration is the inconsistency in the *names* of the locales. They're case-sensitive on most platforms; case- insensitive (I think) on z/OS. I needed to supply the following preamble to make my test case portable: static char #if defined( __APPLE__ ) *US = "en_US.UTF-8", *CA = "en_CA.UTF-8", #elif defined( __linux__ ) *US = "en_US.utf8", *CA = "en_CA.utf8", #elif defined( __MVS__ ) #if ( '0' == 0xf0 ) *US = "En_US.IBM-1047", /* EBCDIC */ *CA = "En_CA.IBM-1047", #else *US = "En_US.UTF-8.xplink", /* ASCII */ *CA = "En_GB.UTF-8.xplink", #endif #elif defined( __sun ) *US = "en_US.ISO8859-1", *CA = "en_CA.ISO8859-1", #else *US = "en_US.utf8", *CA = "en_CA.utf8", #endif *C = "C"; <SIGH\> gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN