In a message dated 2001-10-23 11:13:14 Pacific Daylight Time, [EMAIL PROTECTED] writes:
> On the other hand, one problem is more severe > than in the Chinese case: in the general case, a Serbo-Croatian > string written in Cyrillic cannot be distinguished, on a > character string basis, from uses of Cyrillic for other languages > (e.g., Russian), which should not be mapped and, similarly, a > string written in Roman-based characters cannot be distinguished, > on a character string basis, from the Roman-based characters of > another language (English?) which, again, cannot be mapped. But this problem *does* exist in the Chinese case, because certain Han characters can also be used to write Japanese or (I've been told) Korean. In a Japanese or Korean context, it wouldn't make any sense to map the correct "traditional" Han character to a simplified "equivalent"; the simplified character is only equivalent if the language is Chinese. And we're not tagging languages, so we don't know when this mapping is appropriate and when it's not. -Doug Ewell Fullerton, California
