Re: FYI: Some links about UTF-16

Wu Yongwei Fri, 11 Jul 2003 00:24:45 -0700

S***, it seems I made a mistake.  The font selection in Windows 2000 is not
at all as flexible as Java; it's more like Linux.  Just that the default
font in the Simplified Chinese version is still Tahoma instead of Song Ti.

Jungshik must be right that I could change the default font in locale zh_CN
to make ASCII characters appear nicer.  The only problem is that the
standard locale for Simplified Chinese in Red Hat 8.0 (which I use) is
zh_CN.GB18030.  I was told that it was possible to change that to
zh_CN.UTF-8, but I did not find the motive/time to do that.

Regarding the 'A' APIs in Windows.  Do you mean that there should be some
API to change the interpretation of strings in 'A' APIs (esp. regarding file
names, etc.)?  If that were the case, the OS must speak Unicode in some form
internally.  In my previous message I interpreted your talk about UTF-8 in
'A' APIs as all things are to be encoded in UTF-8 (instead of the
language-specific encodings), which I thought could not be acceptable at the
time of Windows 95.

When talking about the file system, I really like NTFS much better.  POSIX
file system is *too* simple.  I hate the fact that when I switch from
en_US.UTF-8 to zh_CN.GB18030, the file names with characters beyond ASCII
are corrupt.  If the file is on a Windows partition, it is possible to
remount the partition in an appropriate encoding; if it is on an EXT2/3
partition or on a CD-ROM, then I am out of luck.  Maybe the mount tool
should do something to handle this? :-)

Best regards,

Wu Yongwei

--- Original Message from Jungshik Shin ---
On Thu, 10 Jul 2003, Wu Yongwei wrote:

> Jungshik Shin wrote:
>
> > I think it's not so much due to defects in programs as due to the lack
of
> > high-quality fonts. These days, most Linux distributions come with free
> > truetype fonts for zh, ja, ko, th and other Asian scripts. However,
> > the number and the quality of fonts for Linux desktop are still
> > inferior to those for Windows.
>
> The problem is mainly not font itself, but font combination.  I really
> cannot bear the display of ASCII characters in Song Ti, which is simply
ugly
> (and fixed width).

  Why don't you specify a variable-width font as the system default?
I understand you still don't like Latin glyphs in Chinese fonts. I hate
Latin glyphs in Korean fonts, too.

> locale Linux seems to be able to do so, but in the Chinese locale all is
in
> the Chinese font, which is not suitable at all for Latin characters.

  I don't think there's any difference between English and Chinese locales
provided that you meant en_*.UTF-8 and zh_*.UTF-8. You may get an impression
that it seems to work under en_US.UTF-8 because the 'system default font'
for en_US.UTF-8 does not cover Chinese characters and the automatic font
selection mechanism picks up a Chinese font for Chinese characters while
using the default font for Latin letters. On the other hand, in zh*.UTF-8,
the system default font covers Latin letters as well as Chinese characters
so that both Latin/Chinese are rendered with the default font.

  A way to work around is to specify your favorite Latin font ahead
of your Chinese font if CSS-style font list can be used.

> Beginning with Windows 2000, Windows could choose the
> font to use based on the Unicode range (Java does this too).  In the
English

  This is a  good feature to have although CSS-style font list works
most of time.  Almost everything we need for this is already in
place (fontconfig, pango). BTW, I haven't seen this available in
Win2k. How can I do that? (not that I don't believe you but that
I'm curious)

> I used an Windows Gtk application, which used Tahoma (an good sans serif
> font) at first.  But after an upgrade it automatically chose to use the
> system default font, which is the Chinese Song Ti.  It took me several
hours
> to "correct" the ugly and corrupt (yes, because dialogue dimensions are
> different) display.

  Again, I haven't run Gtk programs under Win32 so that I don't know how
they select fonts. Do they use fontconfig? fontconfig can make a big
difference.

> >> There seems little sense now arguing the virtues of UTF-8 and UTF-16.
> >> Technically they both have advantages and disadvantages.  I suppose we

> >   If MS had decided to use UTF-8 (instead of coming up with a whole new
> > set of APIs for UTF-16) with  'A' APIs, Mozilla developers' headache(and
....
> > UTF-8/'A' APIs vs UTF-16/'W' APIs and there are many other things to
> > consider in case of Win32.
>

> It seems impossible because there are some many legacy applications.  On
the
> Simplified Chinese versions of Windows, 'A' always implies GB2312/GBK.
> Switching ALL to UTF-8 seems too radical an idea about 1994.  At the time

 Using 'A' APIs and UTF-8 does not mean that 'A' APIs are made to work ONLY
with UTF-8.  As you know well, 'A' APIs are bascially for APIs to deal with
'char *'. As such, in theory, it can be used for any single or multibyte
encodings
including Windows 932, 936, 949, 950 and 6xxxx(I forgot the codepage
designation for UTF-8).

 As Unix(e.g. Solaris and AIX and to a lesser degree Linux) demonstrated,
a single application (written to support multibyte encodings) can work
well both under legacy-encoding-based locales and under UTF-8 locales.

> Microsoft adopted Unicode, people might truly believe UCS-2 is enough for
> most application, and Microsoft had not the file name compatibility burden
> in Unix

  Well, this is an orthogonal issue. POSIX
file system is so 'simple' (which is a virtue in some aspects) that it
doesn't
have an inherent notion of 'codeset/encoding/charset'. However, Windows
doesn't use POSIX file system and  using 'A' APIs does NOT  mean that they
couldn't use VFAT or NTFS where filenames are in a form of  Unicode.

> (I suppose you all know that the long file names in Windows are in
> UTF-16).

  Actually, VFAT documentation is so hard to come by that we can just
speculate that it's UTF-16 (it could well be just UCS-2 in Windows 95)

> I would not blame Microsoft for this.

  I wouldn't either and I didn't mean to. I believe they weighted
all pros and cons of different options and decided to go with their
two-tiered API approach. In my previous message, I just gave a downside to
that approach aggregating all other arguments into a single phrase
'there are many other things to consider.....'

> Also consider the following
> fact:  Windows 95 emerged at a time when many people had only 8MB of RAM.
> Yah, I don't think AT THAT TIME we could tolerate a 50% growth in memory
> occupation.

 Windows 95/98/ME are not Unicode-enabled in many senses while Win 2k/XP
(NT4 to a
lesser degree) are [1].  Therefore, it was  not an issue for  Win95 in
1994/95 simply
because Win95 still used legacy encodings.

[1] Win 9x/ME is rather like POSIX system running under locales with legacy
encodings
whereas Win 2k/XP is similar to POSIX system running under UTF-8 locales.

 Jungshik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: FYI: Some links about UTF-16

Reply via email to