----- Original Message -----
From: Markus Kuhn <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, June 21, 2001 6:35 AM
Subject: Re: Vietnamese
> On Wed, 20 Jun 2001, Pablo Saratxaga wrote:
> > The combining characters in tcvn-5712 are much like those in iso10646:
> > the preferred way is to use precomposed chars.
>
> I don't think, TCVN-5712 VN2 has enough precomposed characters to
> represent Vietnamese without combining characters adequately, nor does any
> other precomposed 8-bit encoding. Vietnamese orthography (Quoc ngu) really
> requires 134 additional precomposed characters to ASCII, but ASCII leaves
> only 128, so you have to throw out 6 characters at least, and you still
> lack all the symbols that users of 8-bit codes have become accustomed to
> (copyright, mu, exponents, etc.) for daily correspondence. VSCII and
> TCVN-5712 put the additional characters into the C0 region (yes, that is
> how bad it is!), where most Linux software will not be able to display
> them. Before Microsoft switched to Unicode, they introduced WINDOWS-1258
> for Vietnamese, which is completely based on combining characters but
> offers in exchange a decent symbol set.
>
Implementations of TCVN-5712 and VISCII I have used in teh past, work only
because they leave out soe of the capital letters from the font. Each font
comes as part of a set. The first font contains all lower case characters
and most (but not all uppercase characters) ... the second font contains
only uppercase characters in place of lowercase characters.
so .. its not really possible to convert a text file to or from tehse
character set .. without markup its not possible to identify if a lowercase
character is intended to be a lowercase character or an upper case character
... the usual useage is that it is lowercase .. but there is no information
in a text file to be absolutely sure.
Use of Unicode makes life a lot simplier.
Andj.
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/