There is space among the Unicode code points for resolving the chorizo
problem and others of its sort.  One way is to use orthographic marks
or extra, judiciously chosen letters, as in another Spanish<==>English
correspondence, that of caƱon<==>canyon.

For particular characters many-one and one-many equivalences that are
language-specific are also easy to implement.  Most word processors,
for example, will insert a blank/space preceding a deux-points,  ':',
or a point-virgule, ';'  in French text and omit to do so when a
colon,  ':', or a semicolon, ':' in English text.

A processing program can cope with these differences when it knows
which national-language it is dealing in locally, i.e., not too
globally.

There is a (three-roman-alphabetic-character) ISO standard
national-language code that is useful for this purpose, but even it is
problematic in some ways.

Like all long-lived coding schemes it has been abused to make
distinctions that were not envisaged in its original design.  There
are, for example, two codes for the written Chinese of China,  'chs'
for modern Chinese and 'cht' for traditional Chinese; and there are
two codes for serbo-croatian, 'shc' when the Cyrillic alphabet is used
and 'shl'  when the latin/roman alphabet; worse, there is an important
subtext here: If one is Catholic one uses the Roman alphabet, and if
one is Orthodox one uses the Cyrillic alphabet.  One thus needs to
know more than one should need to know in order to use these codes
correctly.

John Gilmore, Ashland, MA 01721 - USA

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to