It is of course possible that some areas do not accept it, much as the United States has not accepted the metric system (except for scientific work, and the important realm of soft-drink bottles). It is difficult to predict the speed of adoption of any technology, but I suspect you will be surprised at the situation in 5 or 10 years.
Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Soobok Lee" <[EMAIL PROTECTED]> To: "Mark Davis" <[EMAIL PROTECTED]>; "IETF idn working group" <[EMAIL PROTECTED]> Sent: Friday, March 22, 2002 16:38 Subject: Re: OT - Re: [idn] URL encoding in html page > Dear Mark, > > UNICODE wil get more and more popularity as time goes by. > But, that does not mean that legacy encodings will disappear or will be obsoleted by UTF8. > There are at least 2 reasons why legacy encodings will be forever. > > 1. most legacy codes are standardized by local *governmants* that are best qualified to > find and reflect local communities's character needs. > For example, Korean GOV has been constantly revising its KSX100? local legacy codes > to include new Graphic characters and new rarely-used Chinese letters , even before > UNICODE decided to include them. > In other words, legacy codes are under control of their language communites. But UNICODE > are not, and has its own schedules and principles and motivations. > It may be *politically* impossible for legacy codes to be obsoleted by UNICODE. > Can we imagine Korean Gov publish its laws and rules documents in UNICODE, not in KSX100x ? > > 2. legacy encodings are already internationally interoperable in popular HTML/MIME contents. > There is no reason why KSX100x-encoded homepage owners/message senders > should abandon legacy encodings and make transitions into UTF8 at the cost of additional > space and operational inefficiency now and even in the forseeable future. > > I believe UNICODE is now everywhere and will be everywhere even in the future. In the same time, > UNICODE has provided legacy encodings/codes with more opportunities to be interoperable with > minimum costs. > > Soobok Lee > > ----- Original Message ----- > From: Mark Davis > To: Soobok Lee ; IETF idn working group > Sent: Saturday, March 23, 2002 3:21 AM > Subject: OT - Re: [idn] URL encoding in html page > > > From my experience talking with customers in the field, the main reason that people are not serving up UTF-8 pages is not the > bandwidth, it is the fact that there are still some browsers out in the field that do not yet handle it correctly. While they are > dying off fairly quickly, it is not quite at the point where people are willing to write them off. > > As far as size goes, it is worthwhile looking at some data samples. The following are from a page on the Unicode site that is > translated into different languages, so it has essentially the same information on each page. > Size Page > 8882 s-chinese.html > 8946 t-chinese.html > 9347 esperanto.html > 9498 maltese.html > 9739 icelandic.html > 9833 czech.html > 9944 welsh.html > 10064 danish.html > 10109 swedish.html > 10127 polish.html > Size Page > 10219 interlingua.html > 10221 italian.html > 10297 spanish.html > 10308 portuguese.html > 10312 lithuanian.html > 10329 german.html > 10376 romanian.html > 10401 korean.html > 10506 french.html > > > Size Page > 10726 japanese.html > 10953 hebrew.html > 11192 arabic.html > 13292 greek.html > 13870 russian.html > 13892 persian.html > 14549 hindi.html > 15337 georgian.html > 15853 deseret.html > > > > > So the best case is about 50% of the worst case. Some of this is due to the encoding, and some is due to different languages just > using different numbers of characters. However, when you look at web pages in general use, the amount of text (in bytes) is really > swamped by graphics, Javascript, HTML code, and so on. So fundamentally, even the variations above are not that important in > practice. > > BTW This is getting way off topic. > > Mark > ————— > > Γνῶθι σαυτόν — Θαλῆς > [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] > > http://www.macchiato.com > > ----- Original Message ----- > From: "Soobok Lee" <[EMAIL PROTECTED]> > To: "Mark Davis" <[EMAIL PROTECTED]>; "IETF idn working group" <[EMAIL PROTECTED]> > Sent: Friday, March 22, 2002 08:16 > Subject: Re: [idn] URL encoding in html page > > > > > > ----- Original Message ----- > > From: "Mark Davis" <[EMAIL PROTECTED]> > > To: "Soobok Lee" <[EMAIL PROTECTED]>; "IETF idn working group" <[EMAIL PROTECTED]> > > Sent: Saturday, March 23, 2002 12:18 AM > > Subject: Re: [idn] URL encoding in html page > > > > > > > Compliant browsers already have to handle Unicode, since NCRs (e.g. > > > ሴ ) are always Unicode code points. All XML parsers also have > > > to handle Unicode (UTF-8 and UTF-16). > > > > Right, Already. > > MS IE and NEtscape already have been supporting UNICODE > > from serveral year ago, but still most homepages are in legacy encodings. > > MS WORD (already unicode based) have features to produce (from > > unicode-based .doc files) legacy encoded .html files for web publishing > > > > Korean/Japanese/Chinese texts in UTF8 are 50% bigger than legacy ones. > > 50% more disk space and bandwidth will be required. > > Each Cyrillic alhpabet in legacy code occupy one octet, while in UTF8, > > it requires 3 octets. 200% more space is needed. > > I cannot imagine the entire Russians make transition to UTF8. > > Legacy encnodings are more space efficient than UNICODE. > > > > legacy-to-legacy conversions like BIG5->KSX1001 are really being implemented > > as two steps of BIG5->UNICODE and UNICODE->KSX1001. UNICODE > > are actively used as such intermediate encodings, but still not be used and entered > > directly by end users so actively. Rather, UNICODE may be a hub to facilitate interchange > > of informations in different legacy encodings or font sharing for differently legacy-encoded chars. > > > > I regard UNICODE as a substrate (not as a competitor) upon which legacy encodings are built. > > > > > > > > > Legacy encodings > > > > will dominates even in the future, because it is compact and > > > > inexpensive. > > > > > > While I do expect the transition to Unicode to take some time, once > > > some of the older browsers die off it may shift more rapidly than we > > > think. > > > > I am not UNICODE expert nor character expert. But, everyday, i feel > > the strong inertia toward legacy encodings in our local language communties. > > language-tagging-enabled text format like HTML will lengthen the lifespan > > of legacy encodings by great amounts and allow legacy-coded HTML texts > > are internationally interchanged without problems. > > > > Soobok Lee > > > > > > > > Mark > > > ————— > > > > > > Γνῶθι σαυτόν — Θαλῆς > > > [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] > > > > > > http://www.macchiato.com > > > > > > ----- Original Message ----- > > > From: "Soobok Lee" <[EMAIL PROTECTED]> > > > To: "IETF idn working group" <[EMAIL PROTECTED]> > > > Sent: Friday, March 22, 2002 02:04 > > > Subject: Re: [idn] URL encoding in html page > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Bruce Thomson" <[EMAIL PROTECTED]> > > > > To: "Soobok Lee" <[EMAIL PROTECTED]>; "IETF idn working group" > > > <[EMAIL PROTECTED]> > > > > Sent: Friday, March 22, 2002 6:29 PM > > > > Subject: Re: [idn] URL encoding in html page > > > > > > > > > > > > > > What if all the html viewable text is in english, but, only the > > > href url contains > > > > > > legacy (korean) encoded hostnames? chinese visitors would see > > > clean english homepage, > > > > > > but fail to click through the korean link. > > > > > > > > > > > Well, that could happen, but a META tag would solve that so > > > easily. Personally > > > > > I often use a simple text editor to deal with HTML, and would find > > > it easier to > > > > > use legacy encodings or UTF-8 than cut-and-paste ACE from > > > somewhere. > > > > > Of course the user could do it either way and it would work. > > > > > > > > Yes. Charset META tags help. But, many homepages have assumptions > > > on the main audience's > > > > default char encodings and very often omit the META tag for the > > > encoding like : > > > > <meta http-equiv="Content-Type" content="text/html; > > > charset=euc-kr"> > > > > > > > > Moreover, IDN url would be used in a pure FRAMESET document that > > > defines frame URLs > > > > and contains no viewable texts. Such FRAMESET documents often omit > > > charset META tags. > > > > (look into the html source of http://www.freeway.co.kr/ ) > > > > > > > > AFIAK, 99.99999% of korean homepages have implicit/explicit > > > > legacy korean encoding (KS_C_5601-1987 or euc-kr). So do most > > > japanese/chineses homepages. > > > > UTF8/UCS-2 encodings are rarely used in global WEB publishing. > > > Legacy encodings > > > > will dominates even in the future, because it is compact and > > > inexpensive. > > > > > > > > IF we want to make IDN truly internationally interoperable, all > > > IDN-aware webbrowsers/applications > > > > should contain libaries of all kinds of legacy-to-Unicode conversion > > > routines. It will burden > > > > too much memory load on handheld devices like PDA. > > > > > > > > Moreover, legacy encodings are revised separately from unicode. We > > > may face with as toughest > > > > versioning problems as we did in stringprep/nameprep versioning > > > problems for newly added unicode points. > > > > How to guarantee stability and intergrity of IDN operations in the > > > all combinations of numerous kinds and versions of iDN-aware > > > > applications and legacy encodings? > > > > > > > > Soobok Lee > > > > > > > > > > > > > > > > > > >
