Yes, the UTF-8/UCS4 is the ultimate solution for all of us. But why most of us are not using it? Why most of chinese guys are still using GB2312/BIG5?
The most important benefit of GB18030 is the GB2312/GBK users do not need convert there data/archieves from GB2312/GBK to other encoding to gain the ability of processing many different languages simultaneously. And they can convert there data from GB18030 to UTF-8 without losing anything, when UTF-8/UCS4 is used widely. I think it will not only waste some CPU cycles but lots of money and time for most of the chinese guys to convert there data from GB2312/GBK to UTF-8! It's indeed a hard work! We cannot expect most of the GB2312/GBK users converting to use UTF-8 in a night or a year! Regards, James Su On 23 Jan 2001, zhaoway wrote: > <[EMAIL PROTECTED]> writes: > > > GB18030 is at the same situation as UTF-8, because GB18030 covers > > all of the code points in ISO10646. For example, you can convert a > > BIG5 text to UCS4 and then convert to GB18030 without losing any > > information. But you cannot make such convertion between GB2312 and > > Big5. > > Oh, man, I convert Big5 into UTF-8, so suppose I've got zh_CN.UTF-8, > then, Bingo! I can read it now, why should I convert it again to > another encoding? For fun? The same is for GB2312 too. > > All characters presented in GB2312 and Big5 will be distincted with > UTF-8 and will be presented with a glyph from an -iso10646-? > font. Would you tell me what is left unsatisfying? > > And all the new information, either from zh_TW.UTF-8 or from > zh_CN.UTF-8, will be presented to you in good form. Why bother GB18030 > then? If you do, you will have to do an _extra_ converting from > zh_TW.UTF-8 or from zh_TW.Big5-compatible-new-standard to > zh_CN.GB18030. Wouldn't _this_ be tiresome? > > > > Locale is not perfect here. Yes, ``ls'' and ``sh'' can understand > > > GB18030 if you've got your locale straight. But think what will happen > > > if you join an IRC channel with people from TW and CN together? The > > > man on the other side of the Internet doesn't seem to understand your > > > LC_ALL settings. 8-P > > If the other side of the internet does not use UTF-8 locale > > he(she) cannot understand you either. > > The better chance is that they will know UTF-8 better than > GB18030. The worse is their own good government will also come out > with their own whatever FORCEFUL-STD-12345 which will, interestingly, > encode Chinese in another funny way because they also want to cover > ISO-10646 and more, heh, tower of Babylon, then. Will you be happy > with this? (And that is the tower of Babylon _after_ we've got > Unicode. So, Unicode/ISO-10646 will only make us _bigger_ tower of > Babylon. Heh..) > > > If you have lots of archives encoded in GB2312 or GBK, How much > > time will you spend to convert them into UTF-8? > > That is called computer, I guess. First of all, to convert from GB2312 > into UTF-8, you will lose no information. Second this task is computer > do-able. > > With GB18030, seems the only benefit (I may be wrong) is that you have > no need to convert your GB2312 archive, to save your CPU cycles to run > converters from UTF-8 and Big5 into GB18030 then, I guess. Fun! Fun! > Fun! 8-) > > And remember we will still have [EMAIL PROTECTED] and > [EMAIL PROTECTED] Oh, yes they have a big firewall, so they're not > interested in these communication stuff really... 8-) > > -- > zhaoway > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] > -- | This message was re-posted from [email protected] | and converted from gb2312 to big5 by an automatic gateway.

