Unicode, character ambiguities

Glenn Maynard Tue, 08 Jan 2002 19:59:26 -0800

A couple things I'm not sure about.

What, exactly, needs to be done by an application (or rather, its data
formats) to accomodate CJK in Unicode (and other languages with similar
ambiguities)?


Is knowing the language enough?  (For example, is it enough in HTML to
write UTF-8 and use the LANG tag?)

Is it generally important or useful to be able to change language mid-
sentence?  (It's much simpler to store a single language for a whole data
element, and it's much easier to render.)

A couple people on a Vorbis list are suggesting allowing RFC2047
encoding in Ogg tags, to let people use encodings other than UTF-8, as a
"fix" for these problems.  One of them appears to consider Unicode
currently useless for real-world data exchange in CJK, and believes this
to be a consensus among Asian users.  I think RFC2047 is a fairly horrible
solution.  An alternative is simply to store the language of the text;
is that sufficient, or are there deeper problems?

What other languages have similar problems?  Something was mentioned
about Russian, as well.  What fixes do they need?

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Unicode, character ambiguities

Reply via email to