John,

Thank you for taking the time to write such a well-thought-out response. I agree with some of the points you make, but I'm going to present arguments against the others. I'm currently leaning towards *not* changing IDNA (other than to fix mistakes and clarify some sections).

John C Klensin wrote:

(1) To the extent possible, we should accommodate all Unicode characters, excluding as little as possible. This position was reinforced by the view that, at the time, the Unicode classifications of characters were considered a little soft and a general conviction that the IETF should not be making character-by-character decisions. A counter-principle, now if not then, is that we should permit a relatively narrow extension of the "letter-digit-hyphen" rule, i.e., permitting, only letters (in any alphabet or script), perhaps local digits, and the hyphen, but no other punctuation, symbols, drawing characters, or other non-letter characters. Adam has argued for that revised principle recently; several people argued for it when IDNA was being produced. We could probably still impose it, and, in any event, it would not require a change in the basic architecture (see below).

I believe it would be difficult to reach consensus on a relatively narrow extension of the LDH rule. Just for starters, the hyphen used to separate names and other strings in the Western world is not used in Japan for Katakana, because Katakana uses a middle dot (U+30FB) to separate 2 Katakana strings. In fact, this character is allowed in .jp.


If we do *not* allow these special local characters that function in the same way as the hyphen in the West, then people in other parts of the world would not only claim that our spec is unfair, they might even ignore it. If we *do* allow this Japanese example, then we have started sliding down a slippery slope that ends with a rather large extension of the LDH rule (for the rest of the world), and then the phishing problem would not be alleviated as much as we might have hoped when we started with just LDH. This would be a lot of work for little gain.

So it's a lose-lose situation. Instead, we should probably stick to IDNA's original principle of allowing a lot of Unicode, and have the local registries, zone administrators and apps address the phishing problem.

(2) When code points had been identified by UTC as the same as,
or equivalent to, others, we tended to map them together, rather
than picking one and prohibiting the others.   This has caused
more problems than most of us expected, with people being
surprised when they register or query using one character and
the result that comes back uses another.  It also creates a
near-homograph problem that we haven't "discovered" in the last
couple of weeks: If we have character X mapping to character Y,
but X looks vaguely like Z, then there may be no Y-Z homograph,
but there may be an X-Z one.  That could make display decisions,
etc., quite critical and, unless applications got it entirely
right, we might end up with a new family of attacks.  Again,
that decision could be reviewed.  Perhaps there are groups of
characters that should be prohibited from being included in a
lookup or registration operation, not just mapped to something
more reasonable.  And, again, this would be a tuning of tables,
not a change in the basis architecture.

It may be possible to "tune" the tables, but nowhere in your email do I find any reference to the ACE prefix. I think that we should also figure out exactly which types of changes would absolutely require a new ACE prefix, and then explore in detail what all the affected parties would have to do to add a new prefix to the mix or to transition to it. The parties I'm thinking of are app developers and registries, mostly, but content developers might also be affected.


The assumption I referred to above was that ICANN would take a
strong role in determining which characters were really
appropriate for registration and under what circumstances, that
they would institute and enforce appropriate rules, and that
everyone relevant would pay attention to whatever they said.
Every element of that assumption has turned out to be false:
they haven't taken that role; their guidelines are weak,
ambiguous, and at least partially wrong; and some registries
have just ignored the rules that do exist without any penalty.
If there is a problem, either we are going to need to solve it,
or we are going to risk different solutions in different
applications that, taken together, compromise interoperability.

I'm currently thinking that we (IETF) can't really solve these problems, and that the registries and apps are going to have to address them. But I strongly sympathize with your stated concern about differing solutions leading to interoperability problems, and so I think "we" (not IETF) must come up with much better registry guidelines and even recommendations and proposals for the apps. Such documents would not necessarily be IETF documents, though they could be if they are merely informational (not standards track). Other organizations like ICANN could then take some of that and fold it into their own doc, but they would probably make some of it normative (or MUST). There isn't really a single organization for the apps (W3C doesn't cover all), so an IETF informational RFC might be good for them.


If
we can get past "right to register", we need to look at the
experience of the browser implementers who have already
concluded that, registered or not, they really don't want to
recognize or process domain names containing such characters.

Some of these implementors might decide to disable IDNA labels under some circumstances, but the existence of a number of IDN plug-ins for MSIE and the extensibility of Mozilla and the need for IDNs around the world suggest that their decisions may be circumvented. Eventually, these implementors may decide to improve their own IDN support. I realize that the short-term decisions may be bad for IDN, but I am hopeful for the future.


Erik



Reply via email to