(I'm trying to move this to IBM-MAIN; it really doesn't
belong on ASSEMBLER-LIST.  And not trimming quoted material
as much as I usually would.)

On 2013-09-04 10:29, Tony Harminc wrote:
> On 1 September 2013 00:51, Paul Gilmartin wrote:
>> On 2013-08-31, at 08:55, John Gilmore wrote:
>>>
>>> ...  They use data transformations to make it possible
>>> for two keys to be compared using a single CLC[L].  (DB2 does similar
>>> things too.)
>>>
>> This can be particularly complex for literary collating conventions
>> such as EN_US which DFSORT gets terribly wrong.  I tried a PMR on
>> this a few years ago.  When I reported that DFSORT and a C program
>> using strcoll() produce similar incorrect results, DFSORT and I
>> agreed that the problem should belong to LE.
>>
>> LE gave me WAD with a rationale so outrageous that I gave up in
>> disgust, making no effort to escalate.
> 
> Isn't it a POSIX violation to produce incorrect collation results for
> a locale? Not, I suppose, that that's stopped them before.
> 
> It's a shame because IBM was in the forefront of getting this
> collation stuff right, and into the POSIX standards. See the early
> Redbook GG24-3516 Keys to Sort and Search for Culturally Expected
> Results, and much subsequent work from IBM's long gone National
> Language Technical Center.
> 
Thanks for the reference.  I'll look for it on publibz.  Or might I
find it on InfoCenter?

The first point of frustration is the inconsistency in the *names*
of the locales.  They're case-sensitive on most platforms; case-
insensitive (I think) on z/OS.  I needed to supply the following
preamble to make my test case portable:

static char
#if defined( __APPLE__ )
    *US = "en_US.UTF-8",
    *CA = "en_CA.UTF-8",
#elif defined( __linux__ )
    *US = "en_US.utf8",
    *CA = "en_CA.utf8",
#elif defined( __MVS__ )
#if ( '0' == 0xf0 )
    *US = "En_US.IBM-1047",      /* EBCDIC */
    *CA = "En_CA.IBM-1047",
#else
    *US = "En_US.UTF-8.xplink",  /* ASCII  */
    *CA = "En_GB.UTF-8.xplink",
#endif
#elif defined( __sun )
    *US = "en_US.ISO8859-1",
    *CA = "en_CA.ISO8859-1",
#else
    *US = "en_US.utf8",
    *CA = "en_CA.utf8",
#endif
    *C = "C";

<SIGH\>
gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to