John Hudson said: > This is all very interesting as a > cultural phenomenon, but nothing > to do with computer encoding.
More than that, it ignores some basics facts about encodings. Like the fact that a language like English has multiple different pronunciations for several different letters yet Unicode does not encode separate characters for them. Or the related fact that languages like Spanish have different pronunciations for some of the same letters as English and yet these "other letters" are not encoded again. There are countless examples of such things, things that Unicode does not do even though they would in theory make NLP easier -- because they would make many other things harder (and there are many petabytes of data that do not have such "features" meaning that the features would not solve the problem. Sinnathurai Srivas has been espousing this very same argument for at least a decade and no amount of explanation of the facts sways him. He often refers to "science" but it is that unique form of science practiced by some of those who are not scientists that one hears about that ignores facts and evidence and truth -- that unoique form of science one can see whose only purpose is to keep yelling the same statement over and over again in the hope that everyone will accept it as truth. Some refer to is as pseudoscience. I admire the persistence, but I do not admire the refusal to understand that even agreeing with him would not change anything because what he wants is out of scope for Unicode. I find myself weary of it, to be honest.... MichKa
