[EMAIL PROTECTED] wrote:

> But believing that there is a collation order that works across all the
> European (Latin script, let's not even go to Cyrillic and Greek) languages
> is a very hopeless fallacy:

Quite true.  But there is a *default* collation that works *fairly* well,
plus machinery for tailoring it to particular cases: see
http://www.unicode.org/unicode/reports/tr10/

Note that collation is user-locale-specific, not language-specific:
as an anglophone browsing a list of Swedish personal names, I want them
collated in English order (ignore accents), not Swedish order.

> the things conflicting are the 'accented' characters (like
> a-diaereses and o-diaereses in German versus in Swedish/Finnish), and special
> 'ligature'-like cases like the 'll' and 'ch' of Spanish, and pairs like v/w and
> i/j being sorted "to the same place", and so on.

This conflates two separate issues: tailoring for localization, and handling
multiple characters as single.  Both are well handled by the collation TR.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <[EMAIL PROTECTED]>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Reply via email to