On 28 October 2013 15:08, John McKown <[email protected]> wrote: > I wasn't wanting to translate words. But when we do a comparison on the z, > we basically just do a byte-for-byte compare. That does not always give the > proper result. I am not very familar with "culturally correct" collations.
I highly recommend (as I have done here before) the old (1990) IBM Redbook GG24-3516 Keys to Sort and Search for Culturally Expected Results. Sadly, it is not only out of print, but I believe available ever on paper only. Much of the detail is dated, but it is still the best overall introduction to the problem and its solution that I know of. And it even has REXX examples... > But I do remember (from 10e7 years ago) that in Spanish, the "ch" is > considered a single character which collates after "c" and before "d". So, > from one stand point, to do a "correct" compare would somehow need to say > that the string: "chorizo" is greater than "ciudad". [In passing, the Spanish, in cooperation with their colleagues in most Latin American countries, updated their standard some years ago to remove this behaviour. This is a bit sad, because their update came at almost exactly the time when correct collation behaviour was appearing in popular systems like Windows and OS/390, and was being absorbed into the UNICODE standard. Their expressed desire to move the language into the "modern" world of computing seems, with 20-20 hindsight, quite misguided.] > But in both CP-1047 and ISO8859-1, this is not true. It is not true in any codepage, and the much stronger claim is that in general proper collation cannot be done on a character-by-character basis, no matter what the encoding. The "Key"s in the Redbook's title refers somewhat cutely to the notion of building strings for comparison that encode written language attributes other than character-by-character collation order. Such keys can be built as needed, or built upon data entry and stored with (or even instead of) the raw data, depending on usage patterns and performance requirements. All this is now quite standard behaviour on almost all systems. Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
