Compliant browsers already have to handle Unicode, since NCRs (e.g. ሴ ) are always Unicode code points. All XML parsers also have to handle Unicode (UTF-8 and UTF-16).
> Legacy encodings > will dominates even in the future, because it is compact and > inexpensive. While I do expect the transition to Unicode to take some time, once some of the older browsers die off it may shift more rapidly than we think. Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Soobok Lee" <[EMAIL PROTECTED]> To: "IETF idn working group" <[EMAIL PROTECTED]> Sent: Friday, March 22, 2002 02:04 Subject: Re: [idn] URL encoding in html page > > ----- Original Message ----- > From: "Bruce Thomson" <[EMAIL PROTECTED]> > To: "Soobok Lee" <[EMAIL PROTECTED]>; "IETF idn working group" <[EMAIL PROTECTED]> > Sent: Friday, March 22, 2002 6:29 PM > Subject: Re: [idn] URL encoding in html page > > > > > What if all the html viewable text is in english, but, only the href url contains > > > legacy (korean) encoded hostnames? chinese visitors would see clean english homepage, > > > but fail to click through the korean link. > > > > > Well, that could happen, but a META tag would solve that so easily. Personally > > I often use a simple text editor to deal with HTML, and would find it easier to > > use legacy encodings or UTF-8 than cut-and-paste ACE from somewhere. > > Of course the user could do it either way and it would work. > > Yes. Charset META tags help. But, many homepages have assumptions on the main audience's > default char encodings and very often omit the META tag for the encoding like : > <meta http-equiv="Content-Type" content="text/html; charset=euc-kr"> > > Moreover, IDN url would be used in a pure FRAMESET document that defines frame URLs > and contains no viewable texts. Such FRAMESET documents often omit charset META tags. > (look into the html source of http://www.freeway.co.kr/ ) > > AFIAK, 99.99999% of korean homepages have implicit/explicit > legacy korean encoding (KS_C_5601-1987 or euc-kr). So do most japanese/chineses homepages. > UTF8/UCS-2 encodings are rarely used in global WEB publishing. Legacy encodings > will dominates even in the future, because it is compact and inexpensive. > > IF we want to make IDN truly internationally interoperable, all IDN-aware webbrowsers/applications > should contain libaries of all kinds of legacy-to-Unicode conversion routines. It will burden > too much memory load on handheld devices like PDA. > > Moreover, legacy encodings are revised separately from unicode. We may face with as toughest > versioning problems as we did in stringprep/nameprep versioning problems for newly added unicode points. > How to guarantee stability and intergrity of IDN operations in the all combinations of numerous kinds and versions of iDN-aware > applications and legacy encodings? > > Soobok Lee > > >
