Tomohiro KUBOTA: > Because the algorithm transliterations is not very good.
I know. > And, many people in the world have to use a small subset of softwares > only because such softwares support their native languages. We're talking about the web pages here, the only software that need Unicode support here are the browsers, and most of them do have it (at varying degrees). > Oh, very good. Please note that east Asian will need not only display > support but also input support, i.e., XIM support. Yes, I'm very aware of that as well (although my direct experience with IMs is limited). I have worked with the Unicode-adaption of our browser for over a year. > (note there is a rival; ISO-2022 is a multilingual encoding scheme > with much longer history). Yeah, and it's a mess, to be honest. This kind of "state-driven" (for lack of a better word) encodings where you cannot easily sync (as you can with UTF-8) is not something I like (the same goes for HZ, which is just a "simplified" form of ISO-2022). > However, for _one_ language (most of Debian web pages are written in > one language, with a small portion of links to other languages), > usage of legacy encodings is better, because of plenty of supporting > softwares, fonts, and so on, so far. We only need to support one kind of software, and that is web browsers. When it comes to fonts, the underlying encoding of the document should *really* have no say in what fonts is used to display the contents (even though I am aware that Netscape 4 does such evil things). Having the underlying encoding as Unicode makes it easier in a lot of cases (no need to transcode) and makes it possible to interchange content between the languages (for example when writing names of people, companies or places, just see what we need to do to the things that are included from .data files on this website, we need to use entities whenever there is a non-ASCII character). > I am also wrestling with a problem that Unicode doesn't have a > relyable mapping table from/to Japanese legacy encodings. That's because of some poor design of the legacy encodings, not Unicode, with multiple mappings of some characters. > See http://www.debian.or.jp/~kubota/unicode-symbols.html for detail. Yes, I have read similar reasoning before. However, many of the problems you are describing are caused by the legacy encodings, not by Unicode. Unicode tries to solve the problems by defining one unambigous encoding, whereas there today are several ambigous legacy encodings. Like the "backslash vs. yen" problem of Shi(f)t-JIS vs EUC-JP/ISO-2022-JP. Boy, is that a headache to implement! Also, the width issue is really a non-issue when it comes to graphical systems, with its proportional fonts. And it was a hack to start with (two encoding bytes = double display-width (which even doesn't hold true for EUC-JP, for instance, with half-width characters of two bytes (half-width kana) and double-width characters of four bytes (the SS1 set)). -- \\// peter - http://www.softwolves.pp.se/ I do not read or respond to mail with HTML attachments. Statement concerning unsolicited e-mail according to Swedish law: http://www.softwolves.pp.se/peter/reklampost.html

