Re: [Drizzle-discuss] Index Prioritization in Storage Engine API

Toru Maesaka Tue, 23 Mar 2010 18:36:57 -0700

Hi Tom,

> I am having trouble locating the specific docs at this time, I will look
> deeper tonight.


Cool! Thanks for your help.

> I believe that both the MySQL memory storage engine , and the ndbcluster
> storage engine perform some tricks to force the optimizer to favor one type
> of access over another.

I don't know the NDB internals but I've been looking at MEMORY/HEAP's
source quite a bit lately and the code for records_in_range() seems to
only search through the Red-Black Tree. In brief,

  (1) If min_key exists, get the position of it in the tree.
  (2) If max_key exists, get the position of it in the tree.
  (3) Subtract the position of max_key with min_key.

So, what it's doing seems pretty darn simple (although I'll recheck
this). What I've started studying though is InnoDB's approach. There's
a pretty good documentation on it too =)

http://www.innodb.com/doc/innodb_plugin-1.0/innodb-other-changes.html#innodb-other-changes-statistics-estimation

Cheers,
Toru


On Wed, Mar 24, 2010 at 3:38 AM, Tom Hanlon <[email protected]> wrote:
> Toru,
>
> I am having trouble locating the specific docs at this time, I will look
> deeper tonight.
>
> Wether this applies to your specific range optimization.. I am not sure.
>
> But...
> I believe that both the MySQL memory storage engine , and the ndbcluster
> storage engine perform some tricks to force the optimizer to favor one type
> of access over another.
>
> memory, because the index and the data are in memory vs the usual analysis
> of index in memory, data on disk.
>
> Cluster..
> For similar reasons.. plus additional details.
>
> I will see if I can find the docs.
>
> --
> Tom
>
> On 22 Mar 2010, at 13:05, Toru Maesaka wrote:
>
>> Hi!
>>
>> Would it make sense to have an engine-level option to tell the
>> optimizer to prioritize index use at all times? Something along the
>> line of HA_PRIORITIZE_INDEX.
>>
>> I'm saying this because the relationship between
>> Cursor::records_in_range() and HA_READ_RANGE seems important to
>> effectively perform range scans at the moment. I tried purposely
>> returning a constant small cost factor in records_in_range() but it
>> seems that the range search will fail if the number of estimated rows
>> (in my case a total lie) is significantly smaller than the number of
>> actual rows in the table. If I don't set, HA_READ_RANGE then it seems
>> Drizzle will resort to a table scan (despite knowing that
>> 'possible_keys' exist in EXPLAIN).
>>
>> Preferably, I don't want to count through the number of rows in a
>> range for the optimizer on every simple query. Although... a
>> completely dishonest implementation of records_in_range() would dirty
>> the EXPLAIN statement result. So there is definitely a downside to
>> what I was experimenting with.
>>
>> It would be great to hear what you all think :)
>>
>> - Toru
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~drizzle-discuss
>> Post to     : [email protected]
>> Unsubscribe : https://launchpad.net/~drizzle-discuss
>> More help   : https://help.launchpad.net/ListHelp
>
> Tom Hanlon
> [email protected]
> Cloudera Certified Hadoop Developer
> Certified MySQL DBA
>
>



-- 
Toru Maesaka <[email protected]>

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Index Prioritization in Storage Engine API

Reply via email to