[ 
https://issues.apache.org/jira/browse/HBASE-25225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-25225:
-----------------------------------
    Description: 
Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test 
-Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only 
needs 2 mins. 
  
 I found the problem is related to procedure schedule. See the below log:
 2020-10-21 13:52:28,097 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1427): 
Finished pid=296, ppid=45, state=SUCCESS; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec
 2020-10-21 13:52:28,538 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1427): 
Finished pid=45, ppid=20, state=SUCCESS; TransitRegionStateProcedure 
table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN 
in 59.4330sec
  
 The real assign procedure only cost 1.6 seconds but the 
TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was 
initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at 
2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. See 
the below log:
 {color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] 
procedure.MasterProcedureScheduler(352): Add TableQueue(testLotsOfRegionRepli2, 
xlock=true (20) sharedLock=0 size=25) to run queue because: pid=45, ppid=20, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access
 {color:#ff0000}2020-10-21 13:52:24,761{color} INFO  [PEWorker-2] 
procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
region=50703895da3cb8c942d3197600d549bc, ASSIGN
  
 But when I tried this UT on another PC, it only cost 2 mins, which is the same 
with branch-2.3. It is weird.
  
 Marked this as blocker for release 2.2.7.

 

If you are interested for this, please run "mvn clean test 
-Dtest=TestRegionReplicaFailover" and comment the cost time here. Thanks.
  

  was:
Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test 
-Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only 
needs 2 mins. 
 
I found the problem is related to procedure schedule. See the below log:
2020-10-21 13:52:28,097 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1427): 
Finished pid=296, ppid=45, state=SUCCESS; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec
2020-10-21 13:52:28,538 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1427): 
Finished pid=45, ppid=20, state=SUCCESS; TransitRegionStateProcedure 
table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN 
in 59.4330sec
 
The real assign procedure only cost 1.6 seconds but the 
TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was 
initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at 
2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. See 
the below log:
{color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] 
procedure.MasterProcedureScheduler(352): Add TableQueue(testLotsOfRegionRepli2, 
xlock=true (20) sharedLock=0 size=25) to run queue because: pid=45, ppid=20, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access
{color:#ff0000}2020-10-21 13:52:24,761{color} INFO  [PEWorker-2] 
procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
region=50703895da3cb8c942d3197600d549bc, ASSIGN
 
 
But when I tried this UT on another PC, it only cost 2 mins, which is the same 
with branch-2.3. It is weird.
 
Marked this as blocker for release 2.2.7.
 


> Create table very slowly if there are multi regions
> ---------------------------------------------------
>
>                 Key: HBASE-25225
>                 URL: https://issues.apache.org/jira/browse/HBASE-25225
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.6
>            Reporter: Guanghao Zhang
>            Priority: Blocker
>
> Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test 
> -Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only 
> needs 2 mins. 
>   
>  I found the problem is related to procedure schedule. See the below log:
>  2020-10-21 13:52:28,097 INFO  [PEWorker-1] 
> procedure2.ProcedureExecutor(1427): Finished pid=296, ppid=45, state=SUCCESS; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec
>  2020-10-21 13:52:28,538 INFO  [PEWorker-3] 
> procedure2.ProcedureExecutor(1427): Finished pid=45, ppid=20, state=SUCCESS; 
> TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
> region=50703895da3cb8c942d3197600d549bc, ASSIGN in 59.4330sec
>   
>  The real assign procedure only cost 1.6 seconds but the 
> TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was 
> initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at 
> 2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. 
> See the below log:
>  {color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] 
> procedure.MasterProcedureScheduler(352): Add 
> TableQueue(testLotsOfRegionRepli2, xlock=true (20) sharedLock=0 size=25) to 
> run queue because: pid=45, ppid=20, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
> region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access
>  {color:#ff0000}2020-10-21 13:52:24,761{color} INFO  [PEWorker-2] 
> procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=testLotsOfRegionRepli2, 
> region=50703895da3cb8c942d3197600d549bc, ASSIGN
>   
>  But when I tried this UT on another PC, it only cost 2 mins, which is the 
> same with branch-2.3. It is weird.
>   
>  Marked this as blocker for release 2.2.7.
>  
> If you are interested for this, please run "mvn clean test 
> -Dtest=TestRegionReplicaFailover" and comment the cost time here. Thanks.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to