[
https://issues.apache.org/jira/browse/HBASE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015234#comment-14015234
]
Hadoop QA commented on HBASE-8073:
----------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12647858/HBASE-8073-trunk-v1.patch
against trunk revision .
ATTACHMENT ID: 12647858
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/9662//console
This message is automatically generated.
> HFileOutputFormat support for offline operation
> -----------------------------------------------
>
> Key: HBASE-8073
> URL: https://issues.apache.org/jira/browse/HBASE-8073
> Project: HBase
> Issue Type: Sub-task
> Components: mapreduce
> Reporter: Nick Dimiduk
> Fix For: 0.99.0
>
> Attachments: HBASE-8073-trunk-v0.patch, HBASE-8073-trunk-v1.patch
>
>
> When using HFileOutputFormat to generate HFiles, it inspects the region
> topology of the target table. The split points from that table are used to
> guide the TotalOrderPartitioner. If the target table does not exist, it is
> first created. This imposes an unnecessary dependence on an online HBase and
> existing table.
> If the table exists, it can be used. However, the job can be smarter. For
> example, if there's far more data going into the HFiles than the table
> currently contains, the table regions aren't very useful for data split
> points. Instead, the input data can be sampled to produce split points more
> meaningful to the dataset. LoadIncrementalHFiles is already capable of
> handling divergence between HFile boundaries and table regions, so this
> should not pose any additional burdon at load time.
> The proper method of sampling the data likely requires a custom input format
> and an additional map-reduce job perform the sampling. See a relevant
> implementation:
> https://github.com/alexholmes/hadoop-book/blob/master/src/main/java/com/manning/hip/ch4/sampler/ReservoirSamplerInputFormat.java
--
This message was sent by Atlassian JIRA
(v6.2#6252)