Hi Jay, Toru, all, Personally, I like a concept of "Pluggable character set and collation" much better than simply rejecting all local encodings(such as EUC-JP, CP932).
I agree that UTF-8 is widely used in many applications (not limited to web applications), but there are some cases that local encodings are better, especially text-oriented applications. Example: Couple of months ago I checked a data size of Wikipedia-Japan (UTF-8 based). The size was 2700MB. When I converted to local encoding (EUC-JP), the size was 2013MB. In Wikipedia case, UTF-8 is 34% larger than local encoding. This is apparently very important for certain types of applications. Not many people want to buy additional disks/servers to implement same functionality. Please also do not forget about collations, which sometimes need considerations. Character sets and collations are currently tightly coupled within MySQL. This is not good because: - Adding a character set or collation on MySQL currently requires MySQL source code modification, which is not acceptable in most cases. - Supporting a lot of character sets and collations is not easy. For example, non-Japanese database engineers have difficulties to support Japanese character set. So, I like "pluggable character set and collation" concept. For example: - UTF-8 as a default character set - Exposing pluggable interface for additional server-side (and client-side is possible) character set and collation - External developers can create character code conversion map (i.e EUC-JP <-> Unicode) - External developers can write collation map (i.e utf8_jis_x_4061_1996) - If client encoding and server(column) encoding are the same, character code conversion does not happen (same as current MySQL) - (Optional) If client encoding and server(column) encoding are different each other, character code conversion happens (same as current MySQL) Regards, ---- Yoshinori Matsunobu Senior MySQL Consultant Sun Microsystems MySQL Consulting Services: http://www-jp.mysql.com/consulting/ > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 30, 2008 1:45 AM > To: drizzle-discuss; Yoshinori Matsunobu > Subject: Toru's thoughts on UTF8 and CJK charsets > > Hi Yoshi, all! > > Toru has outlined some thoughts about UTF8 and CJK charsets and > standardizing drizzle on UTF8 here: > > http://torum.net/2008/09/utf8-over-cjk-drizzle/ > > We'd very much like to get people's input and reactions to > these ideas. > > Cheers, > > Jay _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

