There is space among the Unicode code points for resolving the chorizo problem and others of its sort. One way is to use orthographic marks or extra, judiciously chosen letters, as in another Spanish<==>English correspondence, that of caƱon<==>canyon.
For particular characters many-one and one-many equivalences that are language-specific are also easy to implement. Most word processors, for example, will insert a blank/space preceding a deux-points, ':', or a point-virgule, ';' in French text and omit to do so when a colon, ':', or a semicolon, ':' in English text. A processing program can cope with these differences when it knows which national-language it is dealing in locally, i.e., not too globally. There is a (three-roman-alphabetic-character) ISO standard national-language code that is useful for this purpose, but even it is problematic in some ways. Like all long-lived coding schemes it has been abused to make distinctions that were not envisaged in its original design. There are, for example, two codes for the written Chinese of China, 'chs' for modern Chinese and 'cht' for traditional Chinese; and there are two codes for serbo-croatian, 'shc' when the Cyrillic alphabet is used and 'shl' when the latin/roman alphabet; worse, there is an important subtext here: If one is Catholic one uses the Roman alphabet, and if one is Orthodox one uses the Cyrillic alphabet. One thus needs to know more than one should need to know in order to use these codes correctly. John Gilmore, Ashland, MA 01721 - USA ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
