: Anthony [EMAIL PROTECTED]:) : +有些gb2312字符到big5有好多種寫法,這種情況只有以詞為單位轉換才 : [EMAIL PROTECTED]@個原型。 : [EMAIL PROTECTED]<->big5的詞組對應表,gb2312的分詞 : 字典我現在用的是unicon-im裡面帶的詞組,big5的字典在xcin裡應該能找到。不過這些字 : +撜ㄗS有詞性 :(,沒辦法湊合用吧。我目前不打算在 : autoconvert裡面調用iconv,因為不是所有平台都用glibc的。 :) : [EMAIL PROTECTED] : : Yu Guanghui
Not exactly. iconv is a standard facility in all kinds of modern UNIX systems, including FreeBSD, HP-UX, Solaris, .... etc. And most of them can do the conversion between differen character sets and utf-8. However, it is not guarantee that they can do the conversion between big5 and gb2312. If they can't, it should be treated as a bug. But from your post, you are doing the project which has the functions beyond the iconv :-)) Yes, you are right, for tranditional and simplified Chinese specific, we should write special program to handle the complex conversion, but not left it to iconv. These includes the character set mapping, Tsi (phrases) mapping, etc. But in any cases, I think we should also have a reliable iconv which could at least do the simple mapping between big5 and gb2312. Although many gb2312 characters could map to many big5 characters, it does not matter. We just need a simple/commonly available interface to do that. At least we should not encounter un-convertable (but in fact they should be convertable) characters as in the current status. I think this is the goal we implement the iconv module for big5 <==> gb2312. So, if in gb2312 there contains several characters only available in big5+, I purpose that these characters could be neglect. Unlease in the future we want glibc to support big5+ :-)) : > Before left for vacation, I was also working on writing a gb <==> big5 : > gconv module. The first part of my plan was to establish a "best" mapping : > between gb and big5. I did not take any existing conversion table because : > none of them documented how they got their conversions and I don't feel : > comfortable with that. So I roll my own and took this opportunity to check a : > few popular gb <==> big5 converters. Most of this work has been finished. : > All the gb -> big5 conversions have been checked, but there some big5 -> gb : > conversions left. The result so far looks good. Compare with the table of : > 130+ unmapped gb codes posted by Yu Guanghui a while ago, 35 of them are : > mapped in my table. There are 4 codes not mapped in my table, but mapped in : > autoconvert. However I suspect that autoconvert made mistake in all 4 cases. : > I'll write a more detailed post describing my methodology, conversion : > table and the comparison results in next few days. Then I'd like to hear : > from you. If we all agree upon it, it's fairly easy to write the module. : > Hopefully it will be in time for 2.2.1 release which is said to be soon. Thanks very much for your work :-)) I would be glad to help you in development and testing. :-)) T.H.Hsieh

