[
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Bautin resolved HBASE-4465.
-----------------------------------
Resolution: Fixed
> Lazy-seek optimization for StoreFile scanners
> ---------------------------------------------
>
> Key: HBASE-4465
> URL: https://issues.apache.org/jira/browse/HBASE-4465
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Labels: optimization, seek
> Fix For: 0.94.0, 0.89.20100924
>
> Attachments:
> HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch
>
>
> Previously, if we had several StoreFiles for a column family in a region, we
> would seek in each of them and only then merge the results, even though the
> row/column we are looking for might only be in the most recent (and the
> smallest) file. Now we prioritize our reads from those files so that we check
> the most recent file first. This is done by doing a "lazy seek" which
> pretends that the next value in the StoreFile is (seekRow, seekColumn,
> lastTimestampInStoreFile), which is earlier in the KV order than anything
> that might actually occur in the file. So if we don't find the result in
> earlier files, that fake KV will bubble up to the top of the KV heap and a
> real seek will be done. This is expected to significantly reduce the amount
> of disk IO (as of 09/22/2011 we are doing dark launch testing and
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful
> discussions on this and the idea of putting fake KVs with the highest
> timestamp of the StoreFile in the scanner priority queue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira