[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

Hadoop QA (JIRA) Wed, 26 Nov 2014 18:24:42 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227150#comment-14227150
 ]


Hadoop QA commented on HBASE-12590:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683981/HBase-12590-v1.patch
  against master branch at commit f0d95e7f11403d67b4fc3f1fd4ef048047b6842a.
  ATTACHMENT ID: 12683981

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

    {color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail.

    Compilation errors resume:
    [ERROR] COMPILATION ERROR : 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:[45,48]
 cannot find symbol
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on 
project hbase-server: Compilation failure
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:[45,48]
 cannot find symbol
[ERROR] symbol:   class HLog
[ERROR] location: package org.apache.hadoop.hbase.regionserver.wal
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hbase-server
    

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11848//console

This message is automatically generated.

> A solution for data skew in HBase-Mapreduce Job 
> ------------------------------------------------
>
>                 Key: HBASE-12590
>                 URL: https://issues.apache.org/jira/browse/HBASE-12590
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0
>            Reporter: Weichen Ye
>         Attachments: A Solution for Data Skew in HBase-MapReduce Job.pdf, 
> HBase-12590-v1.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table 
> always contains a lot of small regions and several large regions. Small 
> regions waste a lot of computing resources. If we use a job to scan a table 
> with 3000 small regions, we need a job with 3000 mappers. Large regions 
> always block the job. If in a 100-region table, one region is far larger then 
> the other 99 regions. When we run a job with the table as input, 99 mappers 
> will be completed very quickly, and we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add two new configuration. 
> hbase.mapreduce.split.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.split.targetsize = 1073741824 (default 1GB). The target size 
> of mapreduce splits. 
> If a region size is large than the target size, cut the region into two 
> split.If the sum of several small continuous region size less than the target 
> size, combine these regions into one split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

Reply via email to