Re: [Drizzle-discuss] Toru's thoughts on UTF8 and CJK charsets

Jay Pipes Tue, 30 Sep 2008 08:34:17 -0700

Roland Bouman wrote:
> Hi Jay, all.
> 
>> The simplicity we've reached from narrowing to only support UTF8 is
>> mainly maninfested in reduction of the parser and if adding pluggable
>> charsets back into the server increases the complexity of the parser
>> again, it's going to be a tough sell, particularly to Brian (and me and
>> others..)
> 
> Still, I can't escape the impression that if you allow "everything" to
> be pluggable, then these features offered by the plugins still need to
> be adressable through the SQL dialect (or other language) understood
> by the server. In other words - is it feasible to allow a plugin to
> extend the language spoken by the server, and have the parser dispatch
> the appropriate bits to the modules/plugins that know how to deal with
> them?
> 
> Another example I mentioned in the past are the various engine
> specific SQL statements and table options...
> 
> Any thoughts? Is this crazy?

Not crazy at all, Roland, and this is one area where the plugin API
*must* be refactored.  For instance, it's easy enough to have a
pluggable function register itself in a HASH of functions which the
server then may query during parsing.  But, what about function
arguments?  Should the parser trap incorrectly formed function calls
during parsing, or should the function itself throw an error
post-parsing, once it is passed an incorrect number of arguments?

Is this a limitation of our existing parser, or a limitation of our
plugin API, I'm not sure.  Similarly, as you point out, the engine and
table options...right now, I've implemented them as a repeated string
field in the Drizzled::StorageEngine GPB-generated class.  Should the
parser be aware of which storage engine supports which option, or should
the storage engine handler (or GPB wrapper definition class) return
whether an option is supported?  Currently, my opinion is that the
parser should do as little as possible and let the plugin determine if
the passed query fragment is valid...

As for "extending the SQL syntax" I think this can and should be
possible, especially considering plugins are expected to "extend" the
server environment.  However, can our existing parser handle this?  Not
sure...

-j

>> Cheers, and thanks for the input!
>>
>> Jay
>>
>>>> Cheers,
>>>>
>>>> Jay
>>>>
>>>> Bernt M. Johnsen wrote:
>>>>>>>>>>>>>>>>> Roy Lyseng wrote (2008-09-30 08:33:16):
>>>>>> Another approach would be to create a database in either UTF-8 or
>>>>>> UTF-16  character set. UTF-16 obviously provides a better storage
>>>>>> utilization  with some Asian locales.
>>>>>>
>>>>>> Technically speaking UTF-8 and UTF-16 are different encodings of
>>>>>> the  same character set, so the internal impact of allowing both
>>>>>> would be  minimal (but still significant). And the conversion
>>>>>> between the two is  rather trivial.
>>>>>>
>>>>>> An added advantage of UTF-16 is that all characters are fixed size,
>>>>>> so  it is easy to calculate space of character string given the
>>>>>> number of  characters.
>>>>> Nitpicking: Not quite, some characters will be represented by
>>>>> surrogate pairs so it's not that easy to calculate space after all if
>>>>> you were to be strictly UTF-16 compliant. There are now (Unicode 5.0)
>>>>> assigned "CJK Unified Ideographs Extension B" in SIP (Supplemental
>>>>> Ideographic Plane) in the range 0x20000-0x2a6df and 0x2a700-0x2fa1f.
>>>>>
>>>>> But as log as we stick to BMP (Basic Multilingual Plane) Roy's
>>>>> assumption will hold.
>>>>>
>>>>> And of course I agree with Roy. Do support UTF-8, UTF-16 and maybe
>>>>> UTF-32 too.
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~drizzle-discuss
>> Post to     : [email protected]
>> Unsubscribe : https://launchpad.net/~drizzle-discuss
>> More help   : https://help.launchpad.net/ListHelp
>>
> 
> 
> 

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Toru's thoughts on UTF8 and CJK charsets

Reply via email to