Hi Paul, > Not sure if this helps, but by increasing the value returned by scan_time(), > increases the cost of a table scan, so the optimizer is more likely to use > an index scan.
This certainly helps. Thank you. I've been lurking around PBXT's source code recently for some inspiration on this matter and noticed both InnoDB and PBXT has a comment that points out this function. I also noticed that PBXT's records_in_range() is very concise and from a glimpse, it looks fast. It would be awesome if we can talk about this next month :) Cheers, Toru On Tue, Mar 23, 2010 at 7:06 PM, Paul McCullagh <[email protected]> wrote: > Hi Toru, > > Not sure if this helps, but by increasing the value returned by scan_time(), > increases the cost of a table scan, so the optimizer is more likely to use > an index scan. > > On Mar 23, 2010, at 4:25 AM, Toru Maesaka wrote: > >> G'day Jay, >> >>> Hi Toru! By "prioritize" I'm assuming you mean "always prefer an index >>> scan over a table scan"? >> >> Yep! I should have made this more clear :( I also realized that I >> spoke too soon after sending the previous email. Returning a constant >> cost factor in records_in_range() does seem to work. I really need to >> thank Domas for pointing me out to this approach. >> >> Another solution that could be neat is an API that would return a >> boolean of whether the index should be used for certain query types. >> >>> Actually, this is precisely what InnoDB does. :( Not only does this >>> result in a call to malloc() for a copy of the MySQL key value >>> coverted into InnoDB's internal format, but it also results in a new >>> InnoDB mem_heap allocation and a call to >>> btr_estimate_n_rows_in_range(). For *every* call to >>> Cursor::records_in_range(). >> >> Ouch. Although I wonder what the performance difference would be if we >> could optimize this away... >> >>> Not exactly efficient...I agree that the interface and implementation >>> could be much simpler and cleaner. >> >> Yeah... What really caught my attention was that an engine could end >> up doing the same search twice in a simple statement that involves >> ranges. That is, we provide the optimizer a count of rows in the given >> range by traversing through the tree. IF the optimizer decides to use >> the index, then the engine would most likely go through the same >> search, which I wanted to avoid. >> >> Cheers, >> Toru >> >> >> On Tue, Mar 23, 2010 at 4:31 AM, Jay Pipes <[email protected]> wrote: >>> >>> On Mon, Mar 22, 2010 at 1:05 PM, Toru Maesaka <[email protected]> wrote: >>>> >>>> Hi! >>>> >>>> Would it make sense to have an engine-level option to tell the >>>> optimizer to prioritize index use at all times? Something along the >>>> line of HA_PRIORITIZE_INDEX. >>> >>> Hi Toru! By "prioritize" I'm assuming you mean "always prefer an index >>> scan over a table scan"? >>> >>> Or do you mean something else? >>> >>>> I'm saying this because the relationship between >>>> Cursor::records_in_range() and HA_READ_RANGE seems important to >>>> effectively perform range scans at the moment. I tried purposely >>>> returning a constant small cost factor in records_in_range() but it >>>> seems that the range search will fail if the number of estimated rows >>>> (in my case a total lie) is significantly smaller than the number of >>>> actual rows in the table. If I don't set, HA_READ_RANGE then it seems >>>> Drizzle will resort to a table scan (despite knowing that >>>> 'possible_keys' exist in EXPLAIN). >>>> >>>> Preferably, I don't want to count through the number of rows in a >>>> range for the optimizer on every simple query. Although... a >>>> completely dishonest implementation of records_in_range() would dirty >>>> the EXPLAIN statement result. So there is definitely a downside to >>>> what I was experimenting with. >>> >>> Actually, this is precisely what InnoDB does. :( Not only does this >>> result in a call to malloc() for a copy of the MySQL key value >>> coverted into InnoDB's internal format, but it also results in a new >>> InnoDB mem_heap allocation and a call to >>> btr_estimate_n_rows_in_range(). For *every* call to >>> Cursor::records_in_range(). >>> >>> Not exactly efficient...I agree that the interface and implementation >>> could be much simpler and cleaner. >>> >>> -jay >>> >>>> It would be great to hear what you all think :) >>>> >>>> - Toru >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~drizzle-discuss >>>> Post to : [email protected] >>>> Unsubscribe : https://launchpad.net/~drizzle-discuss >>>> More help : https://help.launchpad.net/ListHelp >>>> >>> >> >> >> >> -- >> Toru Maesaka <[email protected]> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~drizzle-discuss >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~drizzle-discuss >> More help : https://help.launchpad.net/ListHelp > > > > -- > Paul McCullagh > PrimeBase Technologies > www.primebase.org > www.blobstreaming.org > pbxt.blogspot.com > > > > -- Toru Maesaka <[email protected]> _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

