On Wed, 5 Dec 2001 01:17:36 EST [EMAIL PROTECTED] writes: > In a message dated 2001-12-04 20:11:19 Pacific Standard Time, > [EMAIL PROTECTED] writes: > > >> SC/TC equivalence itself is far simpler than the "four winds, two > eggs" > >> equivalences, and has quite a bit of merit. I won't express any > >> real opinion on it until I study it further. > > > > It is not so simple as to be able to be done _accurately_ by an > code-based1-1 > > bit-string matching process. There are semantic, syntactic and > contextual > > considerations that require at the very least a morphological > analysis > process > > in order for TC/SC to be done with a reasonable amount of accuracy > (i.e. > > orthographically). > > Thanks for saying with some authority what I have apparently been > unable to > communicate effectively, namely that TC/SC is not merely a 1-1 > operation > comparable to Latin case folding. > > -Doug Ewell > Fullerton, California >
Excuse me for jump in, I have been keep silent on this view, and I'd like to comment on this issue now. TC/SC is not merely a 1-1 operation, if you only compare it with Latin case folding in what the names imply: TC/SC is a subset of Han, and Han is subset of C,J,K. Latin is a super set of English, French,.... Can you see the flaw on such a comparison? So when you look at Latin in the context of UCS code points, since UCS is the set we are hoping to use blanketly in IDN, then Latin is a subset of (Latin + Armenian + Cyrillic + Hebrew) since I think this is the area that Latin is mostly likely be used too. So this means if you compare TC/SC set of 1-1 cases then the Latin is 1-1. If you compare TC/SC with 1-n, n-1, 1-1, that is in Chinese, then Latin should be put into UCS Plane 0, 1, 2 too. So this Latin is n-1, 1-n too. If you compare TC/SC in the sense of C,J,K block, then Latin + Armenian is the minimum case to think about. Cheers. Liana
