On 28 October 2013 15:08, John McKown <[email protected]> wrote:
> I wasn't wanting to translate words. But when we do a comparison on the z,
> we basically just do a byte-for-byte compare. That does not always give the
> proper result. I am not very familar with "culturally correct" collations.

I highly recommend (as I have done here before) the old (1990) IBM
Redbook GG24-3516 Keys to Sort and Search for Culturally Expected
Results. Sadly, it is not only out of print, but I believe available
ever on paper only.

Much of the detail is dated, but it is still the best overall
introduction to the problem and its solution that I know of. And it
even has REXX examples...

> But I do remember (from 10e7 years ago) that in Spanish, the "ch" is
> considered a single character which collates after "c" and before "d". So,
> from one stand point, to do a "correct" compare would somehow need to say
> that the string: "chorizo" is greater than "ciudad".

[In passing, the Spanish, in cooperation with their colleagues in most
Latin American countries, updated their standard some years ago to
remove this behaviour. This is a bit sad, because their update came at
almost exactly the time when correct collation behaviour was appearing
in popular systems like Windows and OS/390, and was being absorbed
into the UNICODE standard. Their expressed desire to move the language
into the "modern" world of computing seems, with 20-20 hindsight,
quite misguided.]

> But in both CP-1047 and ISO8859-1, this is not true.

It is not true in any codepage, and the much stronger claim is that in
general proper collation cannot be done on a character-by-character
basis, no matter what the encoding. The "Key"s in the Redbook's title
refers somewhat cutely to the notion of building strings for
comparison that encode written language attributes other than
character-by-character collation order. Such keys can be built as
needed, or built upon data entry and stored with (or even instead of)
the raw data, depending on usage patterns and performance
requirements. All this is now quite standard behaviour on almost all
systems.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to