[idn] IDNA problem statement

Erik Nordmark Tue, 22 Oct 2002 18:32:40 -0700

I've done so minor edits to this based on the discussion on the mailing
list plus some other comments I've received.


  Erik


1.1 Problem Statement

The IDNA specification solves the problem of extending the repertoire 
of characters that can be used in domain names to include the Unicode 
repertoire (with some restrictions).

IDNA does not extend the service offered by DNS to the applications. Instead,
the applications (and, by implication, the users) continue to see an
exact-match lookup service. Either there is a single exactly-matching name
or there is no match. This model has served the existing applications well,
but it requires, with or without internationalized domain names,
that users know the exact spelling of the domain names
that the users type into applications such as web browsers and mail user 
agents. The introduction of the larger repertoire of characters potentially
makes the set of misspellings larger, especially given that in some
cases the same appearance, for example on a business card, might visually match
several Unicode code points or several sequences of code points.

IDNA allows the graceful introduction of IDNs not only by avoiding
upgrades to existing infrastructure (such as DNS servers and mail 
transport agents), but also by allowing some rudimentary use of IDNs in 
applications by using the ASCII representation of the non-ASCII name labels.
While such names are very user-unfriendly to read and type, and hence are
not suitable for user input, they allow (for instance) replying to email
and clicking on URLs even though the domain name displayed
is incomprehensible to the user. In order to allow user-friendly
input and output of the IDNs, the applications need to be modified to
conform to this specification.

IDNA uses the Unicode character repertoire, which avoids the significant
delays that would be inherent in waiting for a different 
and specific character set be defined for IDN purposes by some other
standards developing organization.

1.2 Limitations of IDNA

<EXISTING section 6.6 moved to here>
The IDNA protocol does not solve all linguistic issues with users
inputting names in different scripts. Many important language-based and
script-based mappings are not covered in IDNA and need to be handled
outside the protocol. For example, names that are entered in a mix of
traditional and simplified Chinese characters will not be mapped to a
single canonical name. Another example is Scandinavian names that are
entered with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).

<ADDED>
An example of an important issue that is not considered in 
detail in IDNA is how to provide a high probability that a user who 
is entering a domain name based on visual information (such as from a 
business card or billboard) or aural information (such as from a 
telephone or radio) would correctly enter the IDN.
Similar issues issues exist for ASCII domain names, for example
the possible visual confusion between the letter 'O' and the digit zero,
but the introduction of the larger repertoire of characters creates
more opportunities of similar looking and similar sounding names.
Note that this is a complex issue relating to languages, input methods on
computers, and so on.  Furthermore, the kind of matching and
searching necessary for a high probability of success would not fit
the role of the DNS and its exact matching function.


---

[idn] IDNA problem statement

Reply via email to