[
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909362#comment-14909362
]
Andrew Purtell edited comment on HBASE-14283 at 9/26/15 5:33 PM:
-----------------------------------------------------------------
When we were designing tags we accepted some limitations of HFile that later
were problematic, specifically, we couldn't vary cell encoding on a block by
block basis. Even if no cells use tags in a file, we'd bloat each cell with a
short. Later we introduced a whole file optimization for this issue but clearly
we'd have more opportunities to employ it if we could vary encoding strategy on
a block-by-block basis. I thought about introducing an extensible pbufed block
header. It didn't make sense for the tag serialization issue - we would trade
one type of bloat for another, those additional header bytes will end up in the
block cache - but if there are multiple use cases for it lined up, a new
extensible pbufed 'block header' could be worthwhile. Would make future block
level changes less likely to be incompatible changes too.
Does the introduction of something like that require a major version bump? I
think so. I'd like to see us be more like semver with HFile versioning, and if
we're on the same page about that, this is a major version bump because earlier
versioned readers won't be able to handle the change.
Also opens the door to interesting things like using different block encoding
strategies on a block by block basis according to characteristics of the cells
to be encoded within.
was (Author: apurtell):
When we were designing tags we accepted some limitations of HFile that later
were problematic, specifically, we couldn't vary cell encoding on a block by
block basis. Even if no cells use tags in a file, we'd bloat each cell with a
short. Later we introduced a whole file optimization for this issue but clearly
we'd have more opportunities to employ it if we could vary encoding strategy on
a block-by-block basis. I thought about introducing an extensible pbufed block
header. It didn't make sense for the tag serialization issue - we would trade
one type of bloat for another, those additional header bytes will end up in the
block cache - but if there are multiple use cases for it lined up, a new
extensible pbufed 'block header' could be worthwhile. Would make future block
level changes less likely to be incompatible changes too.
Does the introduction of something like that require a major or minor version
bump? I think so. I'd like to see us be more like semver with HFile versioning,
and if we're on the same page about that, this is a major version bump because
earlier versioned readers won't be able to handle the change.
Also opens the door to interesting things like using different block encoding
strategies on a block by block basis according to characteristics of the cells
to be encoded within.
> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --------------------------------------------------------------
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
> Issue Type: Bug
> Reporter: Ben Lau
> Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch,
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf
> level index blocks. The reason is because the seekBefore() call calculates
> the previous data block’s size by assuming data blocks are contiguous which
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile
> version change, but is only performant for 1 and 2-level indexes and not 3+.
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug. But the fix should be
> similar to the case of inline index blocks. The reason I haven’t made the
> change yet is I want to confirm that you guys would be fine with me revising
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and
> getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the
> HFileReader class doesn’t have a reference to the bloom filters (and hence
> their indices) and only constructs the IO streams and hence has no way to
> know where the bloom blocks are in the HFile. It seems that the HFile.Reader
> bloom method comments state that they “know nothing about how that metadata
> is structured” but I do not know if that is a requirement of the abstraction
> (why?) or just an incidental current property.
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’
> field in the block header in the next HFile version, so that seekBefore()
> calls can not only be correct but performant in all cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)