[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118444#comment-13118444 ]
jirapos...@reviews.apache.org commented on HBASE-4496: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/#review2232 ----------------------------------------------------------- Ship it! I love it. This is what I should have done with HBASE-4496 if I had had more knowledge about the reader code. I'll do some more manual testing with your patch applied. This will create extra merging work for HBASE-4422 and HBASE-4344 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java <https://reviews.apache.org/r/2136/#comment5184> This is good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java <https://reviews.apache.org/r/2136/#comment5182> I like this. HFileReaderV2 implementing HFileBlock.BasicReader was strange. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java <https://reviews.apache.org/r/2136/#comment5183> No more casting, awesome. *Very* minor nit, but why not do reader.getDataBlockIndexReader().seekToDataBlock(...) as you do below? - Lars On 2011-09-30 20:41:01, Mikhail Bautin wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2136/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-09-30 20:41:01) bq. bq. bq. Review request for hbase, Jonathan Gray and Lars Hofhansl. bq. bq. bq. Summary bq. ------- bq. bq. This fixes a couple of long-existing code issues in HFile v2: bq. - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). bq. - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. bq. - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). bq. - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. bq. bq. bq. This addresses bug HBASE-4496. bq. https://issues.apache.org/jira/browse/HBASE-4496 bq. bq. bq. Diffs bq. ----- bq. bq. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 bq. src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e bq. bq. Diff: https://reviews.apache.org/r/2136/diff bq. bq. bq. Testing bq. ------- bq. bq. This is in production in Facebook's hbase-89 branch. bq. bq. Still testing this open-source patch -- please don't commit yet. bq. bq. bq. Thanks, bq. bq. Mikhail bq. bq. > HFile V2 does not honor setCacheBlocks when scanning. > ----------------------------------------------------- > > Key: HBASE-4496 > URL: https://issues.apache.org/jira/browse/HBASE-4496 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.92.0, 0.94.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.92.0, 0.94.0 > > Attachments: 4496.txt > > > While testing the LRU cache during the scanning I noticed quite some churn in > the cache even when Scan.cacheBlocks is set to false. After debugging this, I > found that HFile V2 always caches blocks in the LRU cache regardless of the > cacheBlocks setting. > Here's a trace (from Eclipse) showing the problem: > HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 > HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 > HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, > HFileBlock) line: 191 > HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 > HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 > StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 > StoreFileScanner.reseek(KeyValue) line: 110 > KeyValueHeap.reseek(KeyValue) line: 255 > StoreScanner.reseek(KeyValue) line: 409 > StoreScanner.next(List<KeyValue>, int) line: 304 > KeyValueHeap.next(List<KeyValue>, int) line: 114 > KeyValueHeap.next(List<KeyValue>) line: 143 > HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 > HRegion$RegionScannerImpl.nextInternal(int) line: 2722 > HRegion$RegionScannerImpl.next(List<KeyValue>, int) line: 2682 > HRegion$RegionScannerImpl.next(List<KeyValue>) line: 2699 > HRegionServer.next(long, int) line: 2092 > Every scanner.next causes a reseek, which eventually causes a call to > HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the > cacheBlocks information is lost. HFileReaderV2.readBlockData calls > HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. > The fix is not immediately clear, unless we want to pass cacheBlocks to > HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to > HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly > as readBlockData should not care about caching. > Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira