Thomas Chan <[EMAIL PROTECTED]> writes: > On 23 Jan 2001, zhaoway wrote: > > > So, again, my question is, what does GB18030 provide to us, which > > cannot be solved with UTF-8, or Unicode surrogates? (The current > > version of Unicode is not perfect, I agree. But there're no fixed > > Actually, I don't see much difference between surrogates in Unicode > (especially UTF-16 encoding) and the four-byte-long codepoints in GB18030.
Unicode defines a sequence of logical characters. The actual numbers Unicode standard uses to arrange the sequence is, just say so, not critical in most cases. And Unicode is defined in a compatible and evolvable way, that is. UTF-8 is a way to encode the sequence number of Unicode code points into an universally recognizable and distinct-able number by well behaved I18N applications. In a compatible and evolvable way too, that is. UTF-8 could also be used to re-encode GB2312, Big5. And since UTF-8 is very clever, there would be no difficulty to extend UTF-8 to more than 31 bits encodings. And indeed, nothing would prevent Unicode to grow to include all kinds of logical characters more than that could be permitted with more than 31 bits. (Yes, I have to agree this is over optimistic and simplistic.) Only you need _convention_. So _join_ the effort and make the convention with other parties on the _earth_, instead of trying to bully it. That is called _selfish_, I guess. So in the near future, we will hope people all around the world using Unicode and UTF-8 (maybe with an implementation number). And we could exchange our information without bothering the encoding stupidity. Either you recognize the small graphics (glyph) for that character, or not. You don't need to worry if a Chinese character is encoded with a different encoding than what you're using trying to read it. Back to the topic, GB2312 is acceptable, because it predated Unicode and UTF-8. GB13000 is acceptable, because it's compatible with Unicode, and you can indeed using UTF-8 encoding for it. GB18030 is weird, whether it will be a definition for Logical characters, or it will be an actual encoding seeing by applications? In the first case, its only value is to _push_ Unicode. To _compete_ with Unicode is not good, because it will break inter-operability. (Whee, I hate to see the competition of KDE and GNOME to come to two different non-inter-operable component models!) In the second case, it's just a plain disaster. Like to design a bigger GB2312 after all these years. So, here, back to the original topic, what's the purpose to push GB18030 all around our inter-application communication? Chaos. -- zhaoway -- | This message was re-posted from [email protected] | and converted from gb2312 to big5 by an automatic gateway.

