On Wednesday, February 6, 2002, at 11:06 PM, hoho wrote:
> > This is true for TC/SC problem. But, the variant problem roots in the > long history of Han characters. You might be interested in taking a > peek at the problem at one of the Chinese or Japanese dictionaries on > Han variants. > As Ken well knows. It's a nasty, nasty problem in general, and one which Unicode is trying to address. At its last meeting the UTC agreed to request that the IRG start to work on definitive variant data for the CJK repertoire of Unicode/10646. If there is any body in the IT standardization community which should own the problem, it's the IRG. It's international, multilingual, and has the longest and best experience with the characters involved. >> >> B. The move to Unicode implementations means that mingling >> of traditional and simplified orthographies is easier. >> In effect, users now have the rope to hang themselves, >> if they so desire. Whereas, before, the constraints of >> the deployment of IME's and fonts generally meant that >> you couldn't easily mix SC/TC, even when the code page >> nominally supported it. >> > > I would accept A with minor modification as follows. B is not true. > I would disagree here. As you point out, Windows 2000 has the ability to mingle traditional and simplified orthographies because of Unicode 2.0. Mac OS X now has the problem, too, because of Unicode 3.1. In the past on the Mac, if you wanted to do Japanese you used the Japanese "code page," if you wanted to do traditional Chinese, you did the traditional Chinese "code page," and if you wanted to do simplified Chinese, you used the simplified Chinese "code page." This inherent link between code page and nuanced version of a script was something Unicode intended to break. In the past, one had to deliberately work to mix the two, and the mixture was generally obvious. This isn't true now. > > My personal opinion after consulting several experts in Han characters > is to find an international organization, e.g., Unicode Consortium, to > host the standardization of variants. They are also willing to > collaborate > with CJK experts from other countries and regions. Some of their > suggestions are described in the "phased implementation" draft. Again, this is really the IRG's job. The main problem with handing it off to the IRG is that they are not the fastest-moving standards body in history. Another part of the problem is that the data *is* out there. Ken mentions the Sanseido dictionary. I've got a CD of data from Taiwan on Chinese variants, and the HKSAR government has a Web page. I know that MS has data they're sitting on which is used in Office. Unicode's data has been derived in the past by character set analysis, but we are currently working on incorporating data from a commercial product, Wenlin, which has been donated to us. Unfortunately, most of the people who have spent the time and energy to develop this data don't want to contribute it to the public gratis, which is what would happen if it went into Unicode. Even if we simply said that we wanted to take over the data from an authoritative source (Sanseido, the Hanyu Da Zidian), we'd probably have to get legal permission to do so. And there's the additional problem that nobody has a clear model, at least that I've seen, that captures the full plethora of variant "types" with a reasonable taxonomy and sufficient clarity that it can be used in simple, algorithm-generated, lexical-analysis-free situations. > Without a internationalized dictionary for Han variants, the current > IDN proposals are bringing side-effects to users and holders of > domain names of Han characters. > I agree with Ken that it's not clear how *anything* could both include hanzi in IDN *and* be free from unpleasant side effects. > > I appreciate your loving for Japanese. But, if you do care about > Japanese, why did you ignore Chinese and Taiwanese, and perhaps > even more silent Japanese and Korean out there? > I'm sure it wasn't intentional. I believe that Ken's point is that you can't do a Chinese-only solution here. (And I'll naturally bring up the Cantonese speaking population of the world, as just adding to the mess.) ========== John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
