----- Original Message ----- From: "Mark Davis" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]>; "IETF idn working group" <[EMAIL PROTECTED]> Sent: Saturday, March 23, 2002 12:18 AM Subject: Re: [idn] URL encoding in html page
> Compliant browsers already have to handle Unicode, since NCRs (e.g. > ሴ ) are always Unicode code points. All XML parsers also have > to handle Unicode (UTF-8 and UTF-16). Right, Already. MS IE and NEtscape already have been supporting UNICODE from serveral year ago, but still most homepages are in legacy encodings. MS WORD (already unicode based) have features to produce (from unicode-based .doc files) legacy encoded .html files for web publishing Korean/Japanese/Chinese texts in UTF8 are 50% bigger than legacy ones. 50% more disk space and bandwidth will be required. Each Cyrillic alhpabet in legacy code occupy one octet, while in UTF8, it requires 3 octets. 200% more space is needed. I cannot imagine the entire Russians make transition to UTF8. Legacy encnodings are more space efficient than UNICODE. legacy-to-legacy conversions like BIG5->KSX1001 are really being implemented as two steps of BIG5->UNICODE and UNICODE->KSX1001. UNICODE are actively used as such intermediate encodings, but still not be used and entered directly by end users so actively. Rather, UNICODE may be a hub to facilitate interchange of informations in different legacy encodings or font sharing for differently legacy-encoded chars. I regard UNICODE as a substrate (not as a competitor) upon which legacy encodings are built. > > > Legacy encodings > > will dominates even in the future, because it is compact and > > inexpensive. > > While I do expect the transition to Unicode to take some time, once > some of the older browsers die off it may shift more rapidly than we > think. I am not UNICODE expert nor character expert. But, everyday, i feel the strong inertia toward legacy encodings in our local language communties. language-tagging-enabled text format like HTML will lengthen the lifespan of legacy encodings by great amounts and allow legacy-coded HTML texts are internationally interchanged without problems. Soobok Lee > > Mark > ————— > > Γνῶθι σαυτόν — Θαλῆς > [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] > > http://www.macchiato.com > > ----- Original Message ----- > From: "Soobok Lee" <[EMAIL PROTECTED]> > To: "IETF idn working group" <[EMAIL PROTECTED]> > Sent: Friday, March 22, 2002 02:04 > Subject: Re: [idn] URL encoding in html page > > > > > > ----- Original Message ----- > > From: "Bruce Thomson" <[EMAIL PROTECTED]> > > To: "Soobok Lee" <[EMAIL PROTECTED]>; "IETF idn working group" > <[EMAIL PROTECTED]> > > Sent: Friday, March 22, 2002 6:29 PM > > Subject: Re: [idn] URL encoding in html page > > > > > > > > What if all the html viewable text is in english, but, only the > href url contains > > > > legacy (korean) encoded hostnames? chinese visitors would see > clean english homepage, > > > > but fail to click through the korean link. > > > > > > > Well, that could happen, but a META tag would solve that so > easily. Personally > > > I often use a simple text editor to deal with HTML, and would find > it easier to > > > use legacy encodings or UTF-8 than cut-and-paste ACE from > somewhere. > > > Of course the user could do it either way and it would work. > > > > Yes. Charset META tags help. But, many homepages have assumptions > on the main audience's > > default char encodings and very often omit the META tag for the > encoding like : > > <meta http-equiv="Content-Type" content="text/html; > charset=euc-kr"> > > > > Moreover, IDN url would be used in a pure FRAMESET document that > defines frame URLs > > and contains no viewable texts. Such FRAMESET documents often omit > charset META tags. > > (look into the html source of http://www.freeway.co.kr/ ) > > > > AFIAK, 99.99999% of korean homepages have implicit/explicit > > legacy korean encoding (KS_C_5601-1987 or euc-kr). So do most > japanese/chineses homepages. > > UTF8/UCS-2 encodings are rarely used in global WEB publishing. > Legacy encodings > > will dominates even in the future, because it is compact and > inexpensive. > > > > IF we want to make IDN truly internationally interoperable, all > IDN-aware webbrowsers/applications > > should contain libaries of all kinds of legacy-to-Unicode conversion > routines. It will burden > > too much memory load on handheld devices like PDA. > > > > Moreover, legacy encodings are revised separately from unicode. We > may face with as toughest > > versioning problems as we did in stringprep/nameprep versioning > problems for newly added unicode points. > > How to guarantee stability and intergrity of IDN operations in the > all combinations of numerous kinds and versions of iDN-aware > > applications and legacy encodings? > > > > Soobok Lee > > > > > >
