Hi Tom, > I am having trouble locating the specific docs at this time, I will look > deeper tonight.
Cool! Thanks for your help. > I believe that both the MySQL memory storage engine , and the ndbcluster > storage engine perform some tricks to force the optimizer to favor one type > of access over another. I don't know the NDB internals but I've been looking at MEMORY/HEAP's source quite a bit lately and the code for records_in_range() seems to only search through the Red-Black Tree. In brief, (1) If min_key exists, get the position of it in the tree. (2) If max_key exists, get the position of it in the tree. (3) Subtract the position of max_key with min_key. So, what it's doing seems pretty darn simple (although I'll recheck this). What I've started studying though is InnoDB's approach. There's a pretty good documentation on it too =) http://www.innodb.com/doc/innodb_plugin-1.0/innodb-other-changes.html#innodb-other-changes-statistics-estimation Cheers, Toru On Wed, Mar 24, 2010 at 3:38 AM, Tom Hanlon <[email protected]> wrote: > Toru, > > I am having trouble locating the specific docs at this time, I will look > deeper tonight. > > Wether this applies to your specific range optimization.. I am not sure. > > But... > I believe that both the MySQL memory storage engine , and the ndbcluster > storage engine perform some tricks to force the optimizer to favor one type > of access over another. > > memory, because the index and the data are in memory vs the usual analysis > of index in memory, data on disk. > > Cluster.. > For similar reasons.. plus additional details. > > I will see if I can find the docs. > > -- > Tom > > On 22 Mar 2010, at 13:05, Toru Maesaka wrote: > >> Hi! >> >> Would it make sense to have an engine-level option to tell the >> optimizer to prioritize index use at all times? Something along the >> line of HA_PRIORITIZE_INDEX. >> >> I'm saying this because the relationship between >> Cursor::records_in_range() and HA_READ_RANGE seems important to >> effectively perform range scans at the moment. I tried purposely >> returning a constant small cost factor in records_in_range() but it >> seems that the range search will fail if the number of estimated rows >> (in my case a total lie) is significantly smaller than the number of >> actual rows in the table. If I don't set, HA_READ_RANGE then it seems >> Drizzle will resort to a table scan (despite knowing that >> 'possible_keys' exist in EXPLAIN). >> >> Preferably, I don't want to count through the number of rows in a >> range for the optimizer on every simple query. Although... a >> completely dishonest implementation of records_in_range() would dirty >> the EXPLAIN statement result. So there is definitely a downside to >> what I was experimenting with. >> >> It would be great to hear what you all think :) >> >> - Toru >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~drizzle-discuss >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~drizzle-discuss >> More help : https://help.launchpad.net/ListHelp > > Tom Hanlon > [email protected] > Cloudera Certified Hadoop Developer > Certified MySQL DBA > > -- Toru Maesaka <[email protected]> _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

