Re: [Drizzle-discuss] UTF8, charset introducers

Roy Lyseng Thu, 11 Sep 2008 14:04:58 -0700


Jim Starkey wrote:

Jay Pipes wrote:

Regarding collations, Brian and I just chatted about this.  Currently,
MySQL only supports a single byte to indicate the collation, which means
that only 256 collations are supported by MySQL.  This is a problem,
since they've already run out of identifiers.

Brian thinks we can chop the number of supported collations down
significantly in drizzle, because many of them are charset-specific, and
can re-start the ordering from 0, meaning that the ABI would not need to
change.  This is important as heikki has expressed thoughts that he's
not willing to update InnoDB to support a 2-byte collation identifier at
this point.

Jim, does this answer your question or were you looking for a different
answer?

Not really. I've never quite understood how collations are supposed towork in MySQL. Is a collation of a property of a session? Of astatement? Of data? And what does the number 127 have to do with thezillion or so collations defined in the world?


Probably better than having only the collation "binary" :)

In HADB, we implemented collations as an attribute of the charset. Thismay not be the perfect solution (a collation should probably beindependent of charsets?), but from a practical viewpoint you might eguse a homegrown collation implementation for the ASCII charset and theICU implementation for UTF-8. When depending on different externalimplementations it was difficult to guarantee similar behaviour of onecollation when dealing with different character sets.

What is the interaction between and index's collation and the session?Can range retrievals use an index if the session collation does matchthe index collation?

That is a good question... The reason for having an index in a specificcollation is probably that it matches the collation of the locale of theusers. Thus, if the user's locale's collation is the same as thecollation of the index, everything is fine. The tradeoff occurs when auser with a different locale/collation performs a query. Generally youcannot guarantee that the range covered by the index is the same as therange the user expects, hence the index cannot be used. I guess this isa tradeoff that should be decided by the application programmer.

The question behind the question is how should Nimbus handlecollations. I've got a cleaner piece of paper than you do. Have anyadvice?



_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] UTF8, charset introducers

Reply via email to