On Wed, 20 Jun 2001, Pablo Saratxaga wrote:
> The combining characters in tcvn-5712 are much like those in iso10646:
> the preferred way is to use precomposed chars.

I don't think, TCVN-5712 VN2 has enough precomposed characters to
represent Vietnamese without combining characters adequately, nor does any
other precomposed 8-bit encoding. Vietnamese orthography (Quoc ngu) really
requires 134 additional precomposed characters to ASCII, but ASCII leaves
only 128, so you have to throw out 6 characters at least, and you still
lack all the symbols that users of 8-bit codes have become accustomed to
(copyright, mu, exponents, etc.) for daily correspondence. VSCII and
TCVN-5712 put the additional characters into the C0 region (yes, that is
how bad it is!), where most Linux software will not be able to display
them.  Before Microsoft switched to Unicode, they introduced WINDOWS-1258
for Vietnamese, which is completely based on combining characters but
offers in exchange a decent symbol set.

The existing 8-bit sets are all a compromise solution for a language that
really doesn't fit into an 8-bit encoding. As far as I know, ISO 10646 is
the only used standard that encodes Romanized Vietnamese completely
without requiring combining characters. It is therefore the encoding of
Vietnamese that we can support most easily on the widest number of
applications under Linux. There are really not many Vietnamese Linux users
today, but the few that I know seemed highly enthusiastic about UTF-8 so
far and apparently use it quite busily with mined, yudit, emacs, Mozilla,
etc.

I think you waste their time and create false hopes if you get your
Vietnamese customers excited about any of the over a dozen available
single-byte encoding proposals. It's not the way to go.

> In any case, TCVN-5712 and TIS-620 *will* be implemented in GNU/Linux,
> simply because they already are in use since some years, are national
> standard in their respective countries, have a big number of users,

Bigger than 50 on POSIX systems and happy with what they have? I doubt it.
Reality check.

> (they are the exchange encoding with MacOS/Windows people as far as I know)

Both these platforms will convert it back to Unicode internally anyway and
can produce and accept just as easily Vietnamese in UTF-8. No problem
here. You just risk loosing information if you exchange Vietnamese text
via TCVN-5712 that was entered normally with Word/Outlook/Frontpage in
Unicode.

Markus


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to