[
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736047#comment-14736047
]
Anoop Sam John commented on HBASE-14283:
----------------------------------------
Going through the details of this issue and the patch.
We have the optimization of reading next block's header along with every block
to avoid 2 reads (1st header and get data size and then do second read). This
works well for forward read. In case of backward read, where we have to read
the prev block, we tried to solve the case by calculating the prev blocks size
from offset of 2 blocks (considering no other blocks in btw these 2 data
blocks).. As per the bug, this assumption is not true. Good find. If we have
the prev block data size also along with prev block offset, we were good. But
we dont have that in current HFiles. When we dont know the data size of a
block already, we are passing -1 so that it will read block by 2 reads.
Seeing the patch it is very complex. And there are additional overhead also.
Also it is not complete fix as it can not handle more than 2 level index block.
So IMHO it will be better to avoid this kind of complex fix.
We can do the fix in 2 steps.
1. Bump the minor version of the HFile and add the prev block size also along
with offset. When this info is there in HFile we can safely use that and do
read of prev block in one read.
2. For reading old files where this meta data is NOT available, just pass -1
and let it read in 2 steps. It is ok.. Any way once the patch is applied and
over the run the older files will get compacted to new file and this will have
the additional meta info.
> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --------------------------------------------------------------
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
> Issue Type: Bug
> Reporter: Ben Lau
> Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch,
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf
> level index blocks. The reason is because the seekBefore() call calculates
> the previous data block’s size by assuming data blocks are contiguous which
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile
> version change, but is only performant for 1 and 2-level indexes and not 3+.
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug. But the fix should be
> similar to the case of inline index blocks. The reason I haven’t made the
> change yet is I want to confirm that you guys would be fine with me revising
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and
> getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the
> HFileReader class doesn’t have a reference to the bloom filters (and hence
> their indices) and only constructs the IO streams and hence has no way to
> know where the bloom blocks are in the HFile. It seems that the HFile.Reader
> bloom method comments state that they “know nothing about how that metadata
> is structured” but I do not know if that is a requirement of the abstraction
> (why?) or just an incidental current property.
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’
> field in the block header in the next HFile version, so that seekBefore()
> calls can not only be correct but performant in all cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)