Hi, Doug: Due to large character set like UCS, mixed use of scripts and look-alikes cases are so prominent, some sort of classification is unavoidable.
Should it be > Spanish or Italian? Should we care? No, we don't care about these cases, since they are Latin users. Latin has taken care of many different spoken languages already, so do Cyrillic, Arabic and Chinese. On Mon, 26 Nov 2001 11:40:52 EST [EMAIL PROTECTED] writes: > In a message dated 2001-11-26 0:31:52 Pacific Standard Time, > [EMAIL PROTECTED] writes: > > > Have you thought about " Mixed language URLs " > > with language tags, for example: > > > > www.zh-china/mo-mogolia/zh-county/mybusiness.com > > > > shall be able to work? > > I thought one of the fundamental characteristics of domain names, > host names, > URLs, etc. is that they were identifiers, not true names, and hence > they were > not intended to be language-tagged. > > Just as an example, two popular search engines are teoma.com and > altavista.com. What language is "Teoma"? Is "Alta Vista" supposed > to be > Spanish or Italian? Should we care? > > -Doug Ewell > Fullerton, California However, Mixed script is a lot more complex. For example, Japanese uses Kanji, but it is not only phoneticaly different from Chinese, its grammar is completely different from Chinese. The difference is so great that we have to reflect them separeatly in structured data too. My Chinese address label in previous message is an example of such a difference. My Mogolia and Chinese example is another example of mixed used of structured labels, though the Chinese group has not raise this in front of this group. They have too much on their hands already :-( The classification, language tags or script tags, must be used sometime in URL to deal with these issues. I have used "language tag" instead of "script tag", since 1) Different languages uses the same script, such as CJK. 2) Language tag has been defined in [iso639], and some of the issues have been solved already. For example: Does Cantonese have a language tag or not? 3) From engineering point of view, IETF has a list of language reqiurements to consider. That is, can we come up a solution to cover these cases in DNS? If in the future, down the line in IDN, someone challenge us regarding diacrtic marks between French and Dutch, for example, then we have to be able to say this case is covered with a Latin tag. If someone wants more localized features with a French tag, then we may question if such a feature can be accomodated with existing methods or not before we lunch into another tag. I would suggest the following language tags to be considered first: CJK Latin Cyrillic Arabic Bengali Greek Although Greek has smaller group of native users, it is familar to many Latin users, and can serve as a study case in discussion. Liana
