Re: [Drizzle-discuss] shorter index keys (was: latin1 and swe7 character sets)

Clint Byrum Sun, 03 Apr 2011 09:35:34 -0700

Excerpts from Brian Aker's message of Sat Apr 02 18:13:36 -0700 2011:
> Hi!
> 
> For latin1 and swe7 should we accept them as character set specifiers for 
> ease of use? I believe they are a subset of UTF-8.


As a sub-concern.. utf-8 leads to 3-bytes-per-position indexes right
now. I have to wonder if it would be easy to create a new index type that
only indexes 2-byte chars for situations where that is acceptable. The
question of what to do w/ 3 byte chars would need some thought, but
I think my first inclination would be that they would be rejected,
or possibly just stripped out (meaning unique indexes and index scans
would no longer be useful).

Before people go all up in arms about full support of CJK, this would
be something optional where users who don't ever expect to see 3-byte
UTF-8 in their content could optimize. The current situation actually
favors CJK, which typically carries more information in each character
and so will likely get more use out of the 3-bytes-per-position scheme
of indexes.

Another index type I'd like to see is a hash index. Apologies if it
already exists. :)

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] shorter index keys (was: latin1 and swe7 character sets)

Reply via email to