Hi John, (For the record, I think you have got me wrong. The motivation behind my previous post was to merely point out the inappropriate comparison of TC-SC equivalence to Japanese character equivalence, and was not intended to justify (or otherwise) the inclusion of language-dependent canonicalization processes such as TC-SC into the DNS layer)
John C Klensin wrote: > The second question is whether that set of mappings/ conversions/ > translations ought to be incorporated into IDN or the DNS. And > it is _there_ that we differ. Once again, the DNS incorporates > strings of individual characters, not names. The further we get > away from doing bit-string-level matching, the more trouble we > get ourselves into, and TC-SC mappings are pretty far from > bit-string matching. > No, I don't think we differ much on this (--> my first paragraph in this mail). I agree that language-dependent canonicalization may be too incredibly complex to be included in the DNS layer. Doing it at Layers 2 and above seems like a more plausible solution since more locale-specific parameters (or facets, as you call it) can be gathered at that layer, minimizing the risk of incorrectly interpreting equivalences if only bit-string matching is used. But the million-dollar question is this: Given that: Average Internet users probably do not care where the i18n is done anyway, as long as they can access resources with names in their own native language, and, If i18n of names can be achieved at layers other than the DNS, then: Why internationalize the DNS in the first place, and why continue with the work on the IDN WG, if the DNS only serves to provide an identifier-based (and not names-based) lookup service for Internet hosts and services? IMHO, if we have decided to go ahead with IDN and we have come this far, perhaps we should aim to provide a solution as comprehensive as possible, lest users be bewildered when they cannot resolve their hostnames due to non-equivalence of characters. But if the comprehensive solution cannot be done at the DNS layer, then we should not have to live with a half-baked solution, and resources spent on this WG should be diverted to IRNSS instead. > The logic "I should be able to communicate with my Taiwanese > friends, even though we don't use the same characters to write > the same words" doesn't work, at least in this DNS context, any > more than "I should be able to communicate with my > Arabic-speaking friends, even though I can't read their language" > does. If I said that about Arabic, I would be justly criticized > for expecting the DNS to compensate for my ignorance. > Now, I might be justifiably considered ignorant if I were to expect a computer to be able to help me understand written Arabic when I only understand the spoken form. However, the inability to perform mental TC-SC equivalence matching is common amongst many Chinese speakers/writers, largely due to educational policy differences. Many Taiwanese do not read Simplified characters. Most mainland Chinese do not use Traditional characters anymore (although they can read them). Construing this inability as ignorance is thus almost a form of culturo-linguistic arrogance (albeit with a geo-political dimension to it). Perhaps I might not expect the DNS to do it, but that is because I know what the DNS is and what it was originally constructed to do, having a better appreciation of its (limited) responsibilities after reading your dns-role draft. The average Internet user probably does not, though. And these users, especially those with a monolithic view of how computer systems work, expect a certain level of "assistance" (for lack of a better term) to be rendered to them by the system (which encompasses all layers, not just the DNS layer). I think there is an explicit requirement from the users for this "assistance" to be rendered, regardless of what layer it is done at. > > Compare this with Japanese; I do not think you can find two > > Japanese-speaking individuals, one having knowledge of "egg" in > > Kanji ONLY and one having knowledge of "egg" in Hiragana ONLY. > > > > Chances are most Japanese-speakers know both equivalent forms. > > Chinese-speakers may not. > > At least in some styles of teaching Japanese outside of Japan, > Hiragana is taught first. So finding someone who cannot recognize > a given Kanji character, even when the Hirigana for the work is > known or can be guessed merely requires finding someone young > enough or new enough to the language. > Yes, I agree that in this case there might be a possibility that a Japanese speaker may not be able to recognize a Kanji character but can do so for its Hiragana equivalent. (After all it is not uncommon to find Furigana hints alongside Kanji characters in Japanese textbooks to aid in pronunciation for not-so-adept Japanese speakers) However, you are making a comparison here between Japanese speakers based on their level of language mastery (or fluency); a comparison between young Japanese speakers and mature ones is akin to comparing unripened apples to fully-ripe apples. The difference between TC and SC would be better appreciated as one between apples and pears though. i.e. The respective trees probably belong to the same genus/species, but the fruits are distinct enough for us to name them differently (I would surreptitiously doubt the accuracy of this statement for I am no botanist, but assume true for the purposes of this analogy). Let me rephrase my earlier statement for greater clarity: A _fluent_ Japanese speaker/writer would probably know both forms of "egg" in Japanese, but a _fluent_ Chinese speaker/writer may not know both TC and SC forms of "egg" in Chinese. Thus, it is probably a less compelling argument to include Japanese character equivalences into naming/identification engines than it is for Chinese. But one should not construe all forms of CJK equivalences as having the same level of necessity of being included in a canonicalization scheme, because they do not. regards, maynard
