Roland Bouman wrote: > Hi Jay, all. > >> The simplicity we've reached from narrowing to only support UTF8 is >> mainly maninfested in reduction of the parser and if adding pluggable >> charsets back into the server increases the complexity of the parser >> again, it's going to be a tough sell, particularly to Brian (and me and >> others..) > > Still, I can't escape the impression that if you allow "everything" to > be pluggable, then these features offered by the plugins still need to > be adressable through the SQL dialect (or other language) understood > by the server. In other words - is it feasible to allow a plugin to > extend the language spoken by the server, and have the parser dispatch > the appropriate bits to the modules/plugins that know how to deal with > them? > > Another example I mentioned in the past are the various engine > specific SQL statements and table options... > > Any thoughts? Is this crazy?
Not crazy at all, Roland, and this is one area where the plugin API *must* be refactored. For instance, it's easy enough to have a pluggable function register itself in a HASH of functions which the server then may query during parsing. But, what about function arguments? Should the parser trap incorrectly formed function calls during parsing, or should the function itself throw an error post-parsing, once it is passed an incorrect number of arguments? Is this a limitation of our existing parser, or a limitation of our plugin API, I'm not sure. Similarly, as you point out, the engine and table options...right now, I've implemented them as a repeated string field in the Drizzled::StorageEngine GPB-generated class. Should the parser be aware of which storage engine supports which option, or should the storage engine handler (or GPB wrapper definition class) return whether an option is supported? Currently, my opinion is that the parser should do as little as possible and let the plugin determine if the passed query fragment is valid... As for "extending the SQL syntax" I think this can and should be possible, especially considering plugins are expected to "extend" the server environment. However, can our existing parser handle this? Not sure... -j >> Cheers, and thanks for the input! >> >> Jay >> >>>> Cheers, >>>> >>>> Jay >>>> >>>> Bernt M. Johnsen wrote: >>>>>>>>>>>>>>>>> Roy Lyseng wrote (2008-09-30 08:33:16): >>>>>> Another approach would be to create a database in either UTF-8 or >>>>>> UTF-16 character set. UTF-16 obviously provides a better storage >>>>>> utilization with some Asian locales. >>>>>> >>>>>> Technically speaking UTF-8 and UTF-16 are different encodings of >>>>>> the same character set, so the internal impact of allowing both >>>>>> would be minimal (but still significant). And the conversion >>>>>> between the two is rather trivial. >>>>>> >>>>>> An added advantage of UTF-16 is that all characters are fixed size, >>>>>> so it is easy to calculate space of character string given the >>>>>> number of characters. >>>>> Nitpicking: Not quite, some characters will be represented by >>>>> surrogate pairs so it's not that easy to calculate space after all if >>>>> you were to be strictly UTF-16 compliant. There are now (Unicode 5.0) >>>>> assigned "CJK Unified Ideographs Extension B" in SIP (Supplemental >>>>> Ideographic Plane) in the range 0x20000-0x2a6df and 0x2a700-0x2fa1f. >>>>> >>>>> But as log as we stick to BMP (Basic Multilingual Plane) Roy's >>>>> assumption will hold. >>>>> >>>>> And of course I agree with Roy. Do support UTF-8, UTF-16 and maybe >>>>> UTF-32 too. >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~drizzle-discuss >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~drizzle-discuss >> More help : https://help.launchpad.net/ListHelp >> > > > _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

