Hi Jay, all. > The simplicity we've reached from narrowing to only support UTF8 is > mainly maninfested in reduction of the parser and if adding pluggable > charsets back into the server increases the complexity of the parser > again, it's going to be a tough sell, particularly to Brian (and me and > others..)
Still, I can't escape the impression that if you allow "everything" to be pluggable, then these features offered by the plugins still need to be adressable through the SQL dialect (or other language) understood by the server. In other words - is it feasible to allow a plugin to extend the language spoken by the server, and have the parser dispatch the appropriate bits to the modules/plugins that know how to deal with them? Another example I mentioned in the past are the various engine specific SQL statements and table options... Any thoughts? Is this crazy? > > Cheers, and thanks for the input! > > Jay > >>> >>> Cheers, >>> >>> Jay >>> >>> Bernt M. Johnsen wrote: >>>>>>>>>>>>>>>> Roy Lyseng wrote (2008-09-30 08:33:16): >>>>> Another approach would be to create a database in either UTF-8 or >>>>> UTF-16 character set. UTF-16 obviously provides a better storage >>>>> utilization with some Asian locales. >>>>> >>>>> Technically speaking UTF-8 and UTF-16 are different encodings of >>>>> the same character set, so the internal impact of allowing both >>>>> would be minimal (but still significant). And the conversion >>>>> between the two is rather trivial. >>>>> >>>>> An added advantage of UTF-16 is that all characters are fixed size, >>>>> so it is easy to calculate space of character string given the >>>>> number of characters. >>>> Nitpicking: Not quite, some characters will be represented by >>>> surrogate pairs so it's not that easy to calculate space after all if >>>> you were to be strictly UTF-16 compliant. There are now (Unicode 5.0) >>>> assigned "CJK Unified Ideographs Extension B" in SIP (Supplemental >>>> Ideographic Plane) in the range 0x20000-0x2a6df and 0x2a700-0x2fa1f. >>>> >>>> But as log as we stick to BMP (Basic Multilingual Plane) Roy's >>>> assumption will hold. >>>> >>>> And of course I agree with Roy. Do support UTF-8, UTF-16 and maybe >>>> UTF-32 too. > > > _______________________________________________ > Mailing list: https://launchpad.net/~drizzle-discuss > Post to : [email protected] > Unsubscribe : https://launchpad.net/~drizzle-discuss > More help : https://help.launchpad.net/ListHelp > -- Roland Bouman http://rpbouman.blogspot.com/ _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

