Lazy-seek optimization for StoreFile scanners
---------------------------------------------

                 Key: HBASE-4465
                 URL: https://issues.apache.org/jira/browse/HBASE-4465
             Project: HBase
          Issue Type: Improvement
            Reporter: Mikhail Bautin
            Assignee: Mikhail Bautin
             Fix For: 0.92.0, 0.94.0, 0.89.20100924


Previously, if we had several StoreFiles for a column family in a region, we 
would seek in each of them and only then merge the results, even though the 
row/column we are looking for might only be in the most recent (and the 
smallest) file. Now we prioritize our reads from those files so that we check 
the most recent file first. This is done by doing a "lazy seek" which pretends 
that the next value in the StoreFile is (seekRow, seekColumn, 
lastTimestampInStoreFile), which is earlier in the KV order than anything that 
might actually occur in the file. So if we don't find the result in earlier 
files, that fake KV will bubble up to the top of the KV heap and a real seek 
will be done. This is expected to significantly reduce the amount of disk IO 
(as of 09/22/2011 we are doing dark launch testing and measurement).

This is joint work with Liyin Tang -- huge thanks to him for many helpful 
discussions on this and the idea of putting fake KVs with the highest timestamp 
of the StoreFile in the scanner priority queue.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to