I think Drizzle should consider the ability to push down OFFSET to the storage engine. LIMIT/OFFSET is very common in most Web applications. Where there are lists, there are LIMIT/OFFSETs. For example, each user in Blog has a list of his articles. For this list could be long, we display it in pages, using the following SQL:
SELECT <needed attributes> FROM Article WHERE UserID = ? ORDER BY PublishTime DESC LIMIT <page size> OFFSET <page size * page index>; Of course we will create an index on UserID+PublishTime, so that there will be no sorting. However, this method doesn't scale well. If some attributes needed are not in the index, this method generally doesn't scale beyond a OFFSET more than 100, cause there will be to much random reads. At the present day, we have two choice to improve the performce. 1. Add all needed attributes to the index. Generally this will lead to too fat index, so we don't use this choice too often. 2. Fetch a list of keys first and them fetch the needed attributes using the keys. That is, using the following code: <key list> = SELECT KeyAttribute FROM Article WHERE UserID = ? ORDER BY PublishTime DESC LIMIT <page size> OFFSET <page size * page index>; SELECT ... FROM Article WHERE KeyAttribute IN (<key list>); We of course add KeyAttribute to the index. However even the second choice is far from ideal, for several reasons: 1. At the present day, Drizzle will call index_next for <page size * page index> times but throw away their results. This is a big waste. 2. Some row could have been deleted after you fetch the key list, you have to deal with it, or you have to use a transaction (yet another overhead). 3. "SELECT ... FROM Article WHERE KeyAttribute IN (<key list>);" could be a long SQL, thus parsing overhead is considerable. Due to the performance issue, the second choice generally doesn't scale beyond a OFFSET more than 1000. We measured this in a real Web 2.0 application. However, if we can push down OFFSET to the storage engine, the performance could be dramatically improved. To brief, and optimized storage engine could skip lots of index entries at a time. For example, If the minimal key and maximal key in an index leaf page both match the scan condition, the whole page could be skipped, and the number of keys in this page could be added to current OFFSET. Because an index page generally contains hundreds of keys, this will leads to 100x speedup. And more, we can use the simple select method again. We are developping a storage enging and are considering this optimization. However the storage engine doesn't know the OFFSET and whether OFFSET could be pushed down. So we send this feature request. _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

