----- Original Message ----- From: "Soobok Lee" <[EMAIL PROTECTED]> To: "IETF idn working group" <[EMAIL PROTECTED]> Sent: Sunday, March 24, 2002 9:36 PM Subject: Re: [idn] URL encoding in html page
> > ----- Original Message ----- > From: "Soobok Lee" <[EMAIL PROTECTED]> > > > Not necessary, since the HTML and URI specs already limit the host to > > > ASCII letters, digits, hyphens, and dots. > > > > We experts already knew this. But, many ML.com registrants don't know about this > > poor destiny of ML.com. They want to use native ML.com in their HTML homepage. > > > > If we want to have interoperable URI supporting native IDN, we should revise > > URI spec and HTTP spec BOTH. But, native IDN supports accompany potential > > legacy code versioning and code interoperablility problems. > > Would anyone provide indepth analysis on this caveat ? > > > > > Even if we stay with current HTTP/1.1 which allows only ASCII host: header values, > still we could revise URI spec to allow native (utf8 or legacy encoding) IDN in >URI. > > 1) With IDNA and HTTP/1.1 , the web browser can encode Native IDN in URI into ACE >one , and > then open HTTP 1.1 session into the ACEed hostname with ACE host: value. > > 2) With IDNA and revised HTTP with utf8 host support, the web browser can encode > utf8 IDN in URI into ACE one, and then open HTTP session into ACE hostname with >utf8 host: value. > > 3) With UTF8-based IDN and revised HTTP with utf8 host support, it can check >whether > the native IDN is in utf8, and, if not, convert the iDN into utf8 , and then open > HTTP session into utf8 webhost with utf8 host: value. > > > 2) and 3) may be infeasible due to HTTP's lack of capability negotiation feature >like that of ESMTP, s/and 3)// :-) In 3), the webserver surely support native utf8 host: value. > because the new web browser with native IDN URI support can't decide whether the >web server supports > native IDN or supports only ASCII(ACE) host in HOST: value before trying that >twice with both forms > of host: value (utf8 first, and then ACE if needed). Using ACE host: value is >always safe in 1) and 2). > > BTW, in 1) and 2), we cannot avoid legacy versioning problems because > most ACE conversion would be done by "ACE(NFKC(CaseFold(legacy-to-Unicode(native >label))))". > Most homepages in east asia are in legacy encodings and that monopoly (near 100%) >won't change > in the forseeable future. > > new legacy codes may be created after IDN-aware browsers are distributed. > old legacy codes may get new code points for newly added characters. > If IDN-aware browsers/applications are not upgraded with new legacy-to-Unicode >mappings, > they will occasionally fail to convert legacy-encoded IDN into UNICODE one. > That kind of IDN failure had never seen in LDH DNS. > > Soobok Lee > > > > > > >
