[
https://issues.apache.org/jira/browse/HBASE-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096703#comment-14096703
]
Hadoop QA commented on HBASE-11729:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12661625/HBASE-11729.patch
against trunk revision .
ATTACHMENT ID: 12661625
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+0 tests included{color}. The patch appears to be a
documentation patch that doesn't require tests.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+ <para>As we will be discussing changes to the HFile format, it is
useful to give a short overview of the original (HFile version 1) format.</para>
+ <footnote><para>Image courtesy of Lars George, <link
xlink:href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html">hbase-architecture-101-storage.html</link>.</para></footnote>
+ <para>The number of entries in the block index is stored in the fixed file
trailer, and has to be passed in to the method that reads the block index. One
of the limitations of the block index in version 1 is that it does not provide
the compressed size of a block, which turns out to be necessary for
decompression. Therefore, the HFile reader has to infer this compressed size
from the offset difference between blocks. We fix this limitation in version 2,
where we store on-disk block size instead of uncompressed size, and get
uncompressed size from the block header.</para>
+ <para>We found it necessary to revise the HFile format after encountering
high memory usage and slow startup times caused by large Bloom filters and
block indexes in the region server. Bloom filters can get as large as 100 MB
per HFile, which adds up to 2 GB when aggregated over 20 regions. Block indexes
can grow as large as 6 GB in aggregate size over the same set of regions. A
region is not considered opened until all of its block index data is loaded.
Large Bloom filters produce a different performance problem: the first get
request that requires a Bloom filter lookup will incur the latency of loading
the entire Bloom filter bit array.</para>
+ <para>To speed up region server startup we break Bloom filters and block
indexes into multiple blocks and write those blocks out as they fill up, which
also reduces the HFile writer���s memory footprint. In the Bloom filter case,
���filling up a block��� means accumulating enough keys to efficiently utilize
a fixed-size bit array, and in the block index case we accumulate an ���index
block��� of the desired size. Bloom filter blocks and index blocks (we call
these ���inline blocks���) become interspersed with data blocks, and as a side
effect we can no longer rely on the difference between block offsets to
determine data block length, as it was done in version 1.</para>
+ <para>HFile is a low-level file format by design, and it should not deal
with application-specific details such as Bloom filters, which are handled at
StoreFile level. Therefore, we call Bloom filter blocks in an HFile "inline"
blocks. We also supply HFile with an interface to write those inline blocks.
</para>
+ <para>Another format modification aimed at reducing the region server
startup time is to use a contiguous ���load-on-open��� section that has to be
loaded in memory at the time an HFile is being opened. Currently, as an HFile
opens, there are separate seek operations to read the trailer, data/meta
indexes, and file info. To read the Bloom filter, there are two more seek
operations for its ���data��� and ���meta��� portions. In version 2, we seek
once to read the trailer and seek again to read everything else we need to open
the file from a contiguous block.</para></section>
+ <para>The version of HBase introducing the above features reads both
version 1 and 2 HFiles, but only writes version 2 HFiles. A version 2 HFile is
structured as follows:
+ <para>8 bytes: Block type, a sequence of bytes equivalent to version
1's "magic records". Supported block types are: </para>
+ INTERMEDIATE_INDEX ��� intermediate-level index blocks in
a multi-level blockindex
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/10430//console
This message is automatically generated.
> Document HFile v3
> -----------------
>
> Key: HBASE-11729
> URL: https://issues.apache.org/jira/browse/HBASE-11729
> Project: HBase
> Issue Type: Task
> Components: documentation, HFile
> Affects Versions: 0.98.0
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Priority: Trivial
> Labels: beginner
> Attachments: HBASE-11729.patch, HBASE-11729.pdf
>
>
> 0.98 added HFile v3. There are a couple of mentions of it in the book on the
> sections on cell tags, but there isn't an actual overview or design
> explanation like there is for [HFile
> v2|http://hbase.apache.org/book/hfilev2.html].
--
This message was sent by Atlassian JIRA
(v6.2#6252)