I've been looking at the "Internationalized Domain Names Registration and Administration Guideline for Chinese, Japanese and Korean" (draft-jseng-idn-admin-01.txt)
It looked rather interesting as a language to express a policy in if there is characters that could be seen as variants of each other. As I'm from a Norwegian background I've looked at using the guidelines in the draft to describe an administrative domain name policy for the Norwegian language, with a language character variant table. In Norwegian there are few (if any) characters that may be said to be variants of each others in all instances where they are used. The closest we get is perhaps by dropping the accents (using o as a predominant variant of � and �) but as e.g. "for", "f�r" and "f�r" has three different meanings (for, travelled and furrow) it is not immideately given that they should be stuck together in one IDL package. There is also certain characters that is imported fom neighbouring languages and used in Norwegian names, that sound similar in speech to already existing Norwegian characters (e.g � and �). These may be considered variants of each others. (And even if in the end the administrative policy is that all characthers has no variants, the draft does give a way of easily expressing both that fact, and expressing which characters are within the set of valid code points.) Having tried to use it, I have a few questions concerning the interpretation of the different groups in the language character variant table. As far as I can see the recommended variants must also be valid codepoints, and result in domain names that are in the zonefile, while character variants are merely reserved when a valid combination is registered and are not in themselves added to the zonefile. In addition, character variants that aren't valid codepoints can't be the "starting point" for an IDL package. In other words, given a language character variant table where the letters a-z is among the valid codepoints, but does not have any recommended variants or character variants (except the letter itself), and the following lines in addition: 00F8; 00F8; 00F6; 00F6; 00F8; 00F8; (�; �; �; o; �; �;) If I've understood correctly an application for the domain name bj�rn.no will result in an IDL package consisting of bj�rn.no and bj�rn.no, where bj�rn.no is added to the zonefile and bj�rn.no is reserved. (As the � is the recommended variant, while the poor � is just a character variant). What happens according to this policy if the domain name applied for is bj�rn.no? Will the registered name still be bj�rn.no while bj�rn.no is reserved? And if the language table only had a-z and 00F8; 00F8; 00F6; (�; �; �;) would that mean that while one could still apply for bj�rn.no (and get bj�rn.no as a reserved name), someone applying for bj�rn.no would get rejected as that name contains a character that is not part of the valid codepoints? Sorry for asking the basic questions :-) but as the draft states, the quality of the language table is critical for the result... which means that the logic behind how the table is built is important. And while I'm asking questions, has any of you in the WG used the draft for creating a draft policy for a language? - Hilde Thunem
