long running scan lose benefit of bloomfilters and timerange hints
------------------------------------------------------------------
Key: HBASE-4823
URL: https://issues.apache.org/jira/browse/HBASE-4823
Project: HBase
Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
When you have a long running scan due to say an MR job, you can lose the
benefit of timerange hints & bloom filters midway if your scanner gets reset.
[Note: The scanners can get reset say due to a flush or compaction].
In one of our workloads, we periodically want to do rollups on recent 15
minutes of data in a column family... but the timerange hint benefit is lost
midway when this resetScannerStack (shown below) happens. And end result-- we
end up reading all the old HFiles rather than just the recent HFiles.
{code}
private void resetScannerStack(KeyValue lastTopKey) throws IOException {
if (heap != null) {
throw new RuntimeException("StoreScanner.reseek run on an existing
heap!");
}
/* When we have the scan object, should we not pass it to getScanners()
* to get a limited set of scanners? We did so in the constructor and we
* could have done it now by storing the scan object from the constructor */
List<KeyValueScanner> scanners = getScanners();
{code}
The comment in the code seems to be aware of this issue and even has the
suggested fix!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira