[ 
https://issues.apache.org/jira/browse/HBASE-21260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649714#comment-16649714
 ] 

Hadoop QA commented on HBASE-21260:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
40s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 46s{color} 
| {color:red} hbase-server generated 1 new + 187 unchanged - 1 fixed = 188 
total (was 188) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
46s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m  1s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 20s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21260 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943178/HBASE-21260.branch-2.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a766b49b3c1c 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 6125872f48 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| javac | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14694/artifact/patchprocess/diff-compile-javac-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14694/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14694/testReport/ |
| Max. process+thread count | 4503 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14694/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> The whole balancer plans might be aborted if there are more than one plans to 
> move a same region 
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21260
>                 URL: https://issues.apache.org/jira/browse/HBASE-21260
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer, master
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-21260.branch-2.001.patch, 
> HBASE-21260.branch-2.002.patch
>
>
> In SimpleLoadBalancer, plans are generated firstly by average number regions 
> per server for a table. Each server will be randomly assigned either 
> floor(average) or ceiling(average) regions (if the average is not an integer 
> number). But afterwards, the balanceOverall method might generate new plans 
> of some regions of the table to balance server loads in whole cluster scope. 
> As a result, there are plans to move a same region in one call of balance. 
> Currently, branch-2 is using async procedures to implement balancer plans. 
> But the concurrency of moving the same regions will cause the balance method 
> failed. And all the afterwards plans will not be implement when one plan 
> encounters exception.
> We have encountered this problem in our practices, the logs are as follows,
> {color:#205081}2018-09-26,12:12:38,224 INFO 
> [master/c4-hadoop-tst-ct15:52900.Chore.1] 
> org.apache.hadoop.hbase.master.HMaster: Balancer plans size is 3757, the 
> balance interval is 79 ms, and the max number regions in transition is 25
> 2018-09-26,12:12:38,224 INFO [master/c4-hadoop-tst-ct15:52900.Chore.1] 
> org.apache.hadoop.hbase.master.HMaster: balance hri=1588230740, 
> source=c4-hadoop-tst-st99.bj,52900,1537522783781, 
> destination=c4-hadoop-tst-st28.bj,52900,1537520009497
> 2018-09-26,12:12:38,325 INFO [master/c4-hadoop-tst-ct15:52900.Chore.1] 
> org.apache.hadoop.hbase.master.HMaster: balance hri=1588230740, 
> source=c4-hadoop-tst-st99.bj,52900,1537522783781, 
> destination=c4-hadoop-tst-st29.bj,52900,1537522784188
> 2018-09-26,12:12:38,325 INFO [PEWorker-16] 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=119197, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; 
> TransitRegionStateProcedure table=hbase:meta, region=1588230740, REOPEN/MOVE 
> checking lock on 1588230740
> 2018-09-26,12:12:38,325 ERROR [master/c4-hadoop-tst-ct15:52900.Chore.1] 
> org.apache.hadoop.hbase.master.balancer.BalancerChore: Failed to balance.
> org.apache.hadoop.hbase.HBaseIOException: rit=OPEN, 
> location=c4-hadoop-tst-st99.bj,52900,1537522783781, table=hbase:meta, 
> region=1588230740 is currently in transition
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:536)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:592)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:609)
>         at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1707)
>         at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1622)
>         at 
> org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:49)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745){color}
> This is a serious problem because it often occurs when new RSs started or old 
> RSs failover. And what's more, no effective methods can be used to make the 
> balance of the cluster back to normal.
> But the solution of this problem may be simple. We can cache Exceptions when 
> implementing a plan, and then just skip it, avoiding failed plans effect 
> later plans in the whole plans list. New calls of balance can fetch up the 
> failed and skipped plans.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to