>>>>>>>>>>>> Roy Lyseng wrote (2008-09-30 08:33:16): > Another approach would be to create a database in either UTF-8 or UTF-16 > character set. UTF-16 obviously provides a better storage utilization > with some Asian locales. > > Technically speaking UTF-8 and UTF-16 are different encodings of the > same character set, so the internal impact of allowing both would be > minimal (but still significant). And the conversion between the two is > rather trivial. > > An added advantage of UTF-16 is that all characters are fixed size, so > it is easy to calculate space of character string given the number of > characters.
Nitpicking: Not quite, some characters will be represented by surrogate pairs so it's not that easy to calculate space after all if you were to be strictly UTF-16 compliant. There are now (Unicode 5.0) assigned "CJK Unified Ideographs Extension B" in SIP (Supplemental Ideographic Plane) in the range 0x20000-0x2a6df and 0x2a700-0x2fa1f. But as log as we stick to BMP (Basic Multilingual Plane) Roy's assumption will hold. And of course I agree with Roy. Do support UTF-8, UTF-16 and maybe UTF-32 too. -- Bernt Marius Johnsen, Staff Engineer Database Technology Group, Sun Microsystems, Trondheim, Norway
signature.asc
Description: Digital signature
_______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

