On Jun 7, 2010, at 11:44 AM, Warren Young wrote: > On 6/7/2010 9:57 AM, Ryan Chan wrote: >> http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html >> >> Since MySQL only support BMP, so in fact 16 bit is needed actually? > > I imagine they were thinking they'd extend the support to full Unicode in the > future and didn't want you to have to dump and reload your databases when > that happened. The Unicode consortium has stated that Unicode will never > require more than 21 bits per character[*], and 24 bits is the next even > multiple of 8 up from that. > > [*] Why 21? Because that's the maximum number of bits you can express in 4 > bytes with UTF-8 encoding. If Unicode were allowed to use all 2^32 code > points as originally envisioned, it would require up to 6 bytes per character > in UTF-8 encoding. This promise makes UTF-8 code easier to write and easier > to future-proof without bad performance penalties.
Supplemental Unicode characters (4-byte) are supported as of MySQL 5.5.3: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html -- Paul DuBois Oracle Corporation / MySQL Documentation Team Madison, Wisconsin, USA www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org