[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946197#comment-14946197 ]
Ben Lau commented on HBASE-14283: --------------------------------- Hey guys, sorry, I should be able to get back to this soon. Finishing up an unrelated project right now. I didn't know that minor versions in HFiles were also non-backwards compatible. That's one less reason then to make this a major version bump. If anyone has a strong preference for this fix to go into a V3.X I can change the patch to use minor version (eg for header size calculation) when I have time to do it. If not I'll leave it as V4 since it's a little simpler in the code as a major version bump. My original intention btw if it wasn't clear was that this wouldn't be the only change in a V4, just the first change that would go into a V4, whose format/contents is not yet meant to be final even when this patch is committed, i.e. V4 would be essentially a WIP with more changes suggested and implemented in other tickets and eventually released in HBase 2.0. [~anoop.hbase] I'm down for committing a short-term read-the-header-always fix for now and then discussing the longer term solution second. Which branches do you want the patch for? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -------------------------------------------------------------- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug > Reporter: Ben Lau > Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)