[ https://issues.apache.org/jira/browse/HBASE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159305#comment-16159305 ]
Hadoop QA commented on HBASE-16894: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 34m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s{color} | {color:red} hbase-mapreduce generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s{color} | {color:green} hbase-mapreduce in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-mapreduce | | | Integral division result cast to double or float in org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.calculateAutoBalancedSplits(List, long) At TableInputFormatBase.java:double or float in org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.calculateAutoBalancedSplits(List, long) At TableInputFormatBase.java:[line 462] | | | The method name org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.CreateNInputSplits(InputSplit, int) doesn't start with a lower case letter At TableInputFormatBase.java:doesn't start with a lower case letter At TableInputFormatBase.java:[lines 363-412] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:5d60123 | | JIRA Issue | HBASE-16894 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886148/HBASE-12590-v1.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2756b2524f13 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 77ca743 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC3 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/8526/artifact/patchprocess/new-findbugs-hbase-mapreduce.html | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/8526/testReport/ | | modules | C: hbase-mapreduce U: hbase-mapreduce | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/8526/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Create more than 1 split per region, generalize HBASE-12590 > ----------------------------------------------------------- > > Key: HBASE-16894 > URL: https://issues.apache.org/jira/browse/HBASE-16894 > Project: HBase > Issue Type: Improvement > Affects Versions: 3.0.0, 2.0.0-alpha-2 > Reporter: Enis Soztutar > Assignee: Yi Liang > Fix For: 3.0.0, 2.0.0-alpha-3 > > Attachments: HBASE-12590-v1.patch, ImplementaionAndSomeQuestion.docx > > > A common request from users is to be able to better control how many map > tasks are created per region. Right now, it is always 1 region = 1 input > split = 1 map task. Same goes for Spark since it uses the TIF. With region > sizes as large as 50 GBs, it is desirable to be able to create more than 1 > split per region. > HBASE-12590 adds a config property for MR jobs to be able to handle skew in > region sizes. The algorithm is roughly: > {code} > If (region size >= average size*ratio) : cut the region into two MR input > splits > If (average size <= region size < average size*ratio) : one region as one MR > input split > If (sum of several continuous regions size < average size * ratio): combine > these regions into one MR input split. > {code} > Although we can set data skew ratio to be 0.5 or something to abuse > HBASE-12590 into creating more than 1 split task per region, it is not ideal. > But there is no way to create more with the patch as it is. For example we > cannot create more than 2 tasks per region. > If we want to fix this properly, we should extend the approach in > HBASE-12590, and make it so that the client can specify the desired num of > mappers, or desired split size, and the TIF generates the splits based on the > current region sizes very similar to the algorithm in HBASE-12590, but a more > generic way. This also would eliminate the hand tuning of data skew ratio. > We also can think about the guidepost approach that Phoenix has in the stats > table which is used for exactly this purpose. Right now, the region can be > split into powers of two assuming uniform distribution within the region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)