Hi Jay, all.

> The simplicity we've reached from narrowing to only support UTF8 is
> mainly maninfested in reduction of the parser and if adding pluggable
> charsets back into the server increases the complexity of the parser
> again, it's going to be a tough sell, particularly to Brian (and me and
> others..)

Still, I can't escape the impression that if you allow "everything" to
be pluggable, then these features offered by the plugins still need to
be adressable through the SQL dialect (or other language) understood
by the server. In other words - is it feasible to allow a plugin to
extend the language spoken by the server, and have the parser dispatch
the appropriate bits to the modules/plugins that know how to deal with
them?

Another example I mentioned in the past are the various engine
specific SQL statements and table options...

Any thoughts? Is this crazy?

>
> Cheers, and thanks for the input!
>
> Jay
>
>>>
>>> Cheers,
>>>
>>> Jay
>>>
>>> Bernt M. Johnsen wrote:
>>>>>>>>>>>>>>>> Roy Lyseng wrote (2008-09-30 08:33:16):
>>>>> Another approach would be to create a database in either UTF-8 or
>>>>> UTF-16  character set. UTF-16 obviously provides a better storage
>>>>> utilization  with some Asian locales.
>>>>>
>>>>> Technically speaking UTF-8 and UTF-16 are different encodings of
>>>>> the  same character set, so the internal impact of allowing both
>>>>> would be  minimal (but still significant). And the conversion
>>>>> between the two is  rather trivial.
>>>>>
>>>>> An added advantage of UTF-16 is that all characters are fixed size,
>>>>> so  it is easy to calculate space of character string given the
>>>>> number of  characters.
>>>> Nitpicking: Not quite, some characters will be represented by
>>>> surrogate pairs so it's not that easy to calculate space after all if
>>>> you were to be strictly UTF-16 compliant. There are now (Unicode 5.0)
>>>> assigned "CJK Unified Ideographs Extension B" in SIP (Supplemental
>>>> Ideographic Plane) in the range 0x20000-0x2a6df and 0x2a700-0x2fa1f.
>>>>
>>>> But as log as we stick to BMP (Basic Multilingual Plane) Roy's
>>>> assumption will hold.
>>>>
>>>> And of course I agree with Roy. Do support UTF-8, UTF-16 and maybe
>>>> UTF-32 too.
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~drizzle-discuss
> Post to     : [email protected]
> Unsubscribe : https://launchpad.net/~drizzle-discuss
> More help   : https://help.launchpad.net/ListHelp
>



-- 
Roland Bouman
http://rpbouman.blogspot.com/

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to