(I'm trying to move this to IBM-MAIN; it really doesn't
belong on ASSEMBLER-LIST. And not trimming quoted material
as much as I usually would.)
On 2013-09-04 10:29, Tony Harminc wrote:
> On 1 September 2013 00:51, Paul Gilmartin wrote:
>> On 2013-08-31, at 08:55, John Gilmore wrote:
>>>
>>> ... They use data transformations to make it possible
>>> for two keys to be compared using a single CLC[L]. (DB2 does similar
>>> things too.)
>>>
>> This can be particularly complex for literary collating conventions
>> such as EN_US which DFSORT gets terribly wrong. I tried a PMR on
>> this a few years ago. When I reported that DFSORT and a C program
>> using strcoll() produce similar incorrect results, DFSORT and I
>> agreed that the problem should belong to LE.
>>
>> LE gave me WAD with a rationale so outrageous that I gave up in
>> disgust, making no effort to escalate.
>
> Isn't it a POSIX violation to produce incorrect collation results for
> a locale? Not, I suppose, that that's stopped them before.
>
> It's a shame because IBM was in the forefront of getting this
> collation stuff right, and into the POSIX standards. See the early
> Redbook GG24-3516 Keys to Sort and Search for Culturally Expected
> Results, and much subsequent work from IBM's long gone National
> Language Technical Center.
>
Thanks for the reference. I'll look for it on publibz. Or might I
find it on InfoCenter?
The first point of frustration is the inconsistency in the *names*
of the locales. They're case-sensitive on most platforms; case-
insensitive (I think) on z/OS. I needed to supply the following
preamble to make my test case portable:
static char
#if defined( __APPLE__ )
*US = "en_US.UTF-8",
*CA = "en_CA.UTF-8",
#elif defined( __linux__ )
*US = "en_US.utf8",
*CA = "en_CA.utf8",
#elif defined( __MVS__ )
#if ( '0' == 0xf0 )
*US = "En_US.IBM-1047", /* EBCDIC */
*CA = "En_CA.IBM-1047",
#else
*US = "En_US.UTF-8.xplink", /* ASCII */
*CA = "En_GB.UTF-8.xplink",
#endif
#elif defined( __sun )
*US = "en_US.ISO8859-1",
*CA = "en_CA.ISO8859-1",
#else
*US = "en_US.utf8",
*CA = "en_CA.utf8",
#endif
*C = "C";
<SIGH\>
gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN