Re: [Drizzle-discuss] Index Prioritization in Storage Engine API

Toru Maesaka Mon, 22 Mar 2010 20:27:45 -0700

G'day Jay,

> Hi Toru! By "prioritize" I'm assuming you mean "always prefer an index
> scan over a table scan"?


Yep! I should have made this more clear :( I also realized that I
spoke too soon after sending the previous email. Returning a constant
cost factor in records_in_range() does seem to work. I really need to
thank Domas for pointing me out to this approach.

Another solution that could be neat is an API that would return a
boolean of whether the index should be used for certain query types.

> Actually, this is precisely what InnoDB does. :(  Not only does this
> result in a call to malloc() for a copy of the MySQL key value
> coverted into InnoDB's internal format, but it also results in a new
> InnoDB mem_heap allocation and a call to
> btr_estimate_n_rows_in_range().  For *every* call to
> Cursor::records_in_range().

Ouch. Although I wonder what the performance difference would be if we
could optimize this away...

> Not exactly efficient...I agree that the interface and implementation
> could be much simpler and cleaner.

Yeah... What really caught my attention was that an engine could end
up doing the same search twice in a simple statement that involves
ranges. That is, we provide the optimizer a count of rows in the given
range by traversing through the tree. IF the optimizer decides to use
the index, then the engine would most likely go through the same
search, which I wanted to avoid.

Cheers,
Toru


On Tue, Mar 23, 2010 at 4:31 AM, Jay Pipes <[email protected]> wrote:
> On Mon, Mar 22, 2010 at 1:05 PM, Toru Maesaka <[email protected]> wrote:
>> Hi!
>>
>> Would it make sense to have an engine-level option to tell the
>> optimizer to prioritize index use at all times? Something along the
>> line of HA_PRIORITIZE_INDEX.
>
> Hi Toru! By "prioritize" I'm assuming you mean "always prefer an index
> scan over a table scan"?
>
> Or do you mean something else?
>
>> I'm saying this because the relationship between
>> Cursor::records_in_range() and HA_READ_RANGE seems important to
>> effectively perform range scans at the moment. I tried purposely
>> returning a constant small cost factor in records_in_range() but it
>> seems that the range search will fail if the number of estimated rows
>> (in my case a total lie) is significantly smaller than the number of
>> actual rows in the table. If I don't set, HA_READ_RANGE then it seems
>> Drizzle will resort to a table scan (despite knowing that
>> 'possible_keys' exist in EXPLAIN).
>>
>> Preferably, I don't want to count through the number of rows in a
>> range for the optimizer on every simple query. Although... a
>> completely dishonest implementation of records_in_range() would dirty
>> the EXPLAIN statement result. So there is definitely a downside to
>> what I was experimenting with.
>
> Actually, this is precisely what InnoDB does. :(  Not only does this
> result in a call to malloc() for a copy of the MySQL key value
> coverted into InnoDB's internal format, but it also results in a new
> InnoDB mem_heap allocation and a call to
> btr_estimate_n_rows_in_range().  For *every* call to
> Cursor::records_in_range().
>
> Not exactly efficient...I agree that the interface and implementation
> could be much simpler and cleaner.
>
> -jay
>
>> It would be great to hear what you all think :)
>>
>> - Toru
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~drizzle-discuss
>> Post to     : [email protected]
>> Unsubscribe : https://launchpad.net/~drizzle-discuss
>> More help   : https://help.launchpad.net/ListHelp
>>
>



-- 
Toru Maesaka <[email protected]>

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Index Prioritization in Storage Engine API

Reply via email to