A couple things I'm not sure about. What, exactly, needs to be done by an application (or rather, its data formats) to accomodate CJK in Unicode (and other languages with similar ambiguities)?
Is knowing the language enough? (For example, is it enough in HTML to write UTF-8 and use the LANG tag?) Is it generally important or useful to be able to change language mid- sentence? (It's much simpler to store a single language for a whole data element, and it's much easier to render.) A couple people on a Vorbis list are suggesting allowing RFC2047 encoding in Ogg tags, to let people use encodings other than UTF-8, as a "fix" for these problems. One of them appears to consider Unicode currently useless for real-world data exchange in CJK, and believes this to be a consensus among Asian users. I think RFC2047 is a fairly horrible solution. An alternative is simply to store the language of the text; is that sufficient, or are there deeper problems? What other languages have similar problems? Something was mentioned about Russian, as well. What fixes do they need? -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
