[ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153655#comment-13153655 ]
Prakash Khemani commented on HBASE-4823: ---------------------------------------- https://issues.apache.org/jira/browse/HBASE-3415 is also related > long running scans lose benefit of bloomfilters and timerange hints > ------------------------------------------------------------------- > > Key: HBASE-4823 > URL: https://issues.apache.org/jira/browse/HBASE-4823 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: Kannan Muthukkaruppan > > When you have a long running scan due to say an MR job, you can lose the > benefit of timerange hints & bloom filters midway if your scanner gets reset. > [Note: The scanners can get reset say due to a flush or compaction]. > In one of our workloads, we periodically want to do rollups on recent 15 > minutes of data in a column family... but the timerange hint benefit is lost > midway when this resetScannerStack (shown below) happens. And end result-- we > end up reading all the old HFiles rather than just the recent HFiles. > {code} > private void resetScannerStack(KeyValue lastTopKey) throws IOException { > if (heap != null) { > throw new RuntimeException("StoreScanner.reseek run on an existing > heap!"); > } > /* When we have the scan object, should we not pass it to getScanners() > * to get a limited set of scanners? We did so in the constructor and we > * could have done it now by storing the scan object from the constructor > */ > List<KeyValueScanner> scanners = getScanners(); > {code} > The comment in the code seems to be aware of this issue and even has the > suggested fix! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira