[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249605#comment-14249605
]
Hadoop QA commented on HBASE-12590:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12687680/HBASE-12590-v4.patch
against master branch at commit 99a11390b4758c211af04af2ca0696ac6e3e0aeb.
ATTACHMENT ID: 12687680
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:red}-1 checkstyle{color}. The applied patch generated
2086 checkstyle errors (more than the master's current 2084 errors).
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//console
This message is automatically generated.
> A solution for data skew in HBase-Mapreduce Job
> -----------------------------------------------
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Weichen Ye
> Attachments: A Solution for Data Skew in HBase-MapReduce Job
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job
> (Version3).pdf, HBASE-12590-v3.patch, HBASE-12590-v4.patch,
> HBase-12590-v1.patch, HBase-12590-v2.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may
> contains a lot of small regions and several large regions. Small regions
> waste a lot of computing resources. If we use a job to scan a table with 3000
> small regions, we need a job with 3000 mappers. Large regions always block
> the job. If in a 100-region table, one region is far large then the other 99
> regions. When we run a job with the table as input, 99 mappers will be
> completed very quickly, and then we need to wait for the last mapper for a
> long time.
> 2, Configuration
> Add three new configuration
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in
> HBase-MapReduce jobs. The default value is false.
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region
> size is larger than 3x average region size, treat the region as
> “proportionately too large”.
> hbase.table.row.textkey = true means the row key is text. False means binary
> row key. It is used to find the mid row key in large region. The default
> value is true.
> If (region size >= average size*ratio) : cut the region into two MR input
> splits
> If (average size <= region size < average size*ratio) : one region as one MR
> input split
> If (sum of several continuous regions size < average size): combine these
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)