[ https://issues.apache.org/jira/browse/HBASE-25225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guanghao Zhang updated HBASE-25225: ----------------------------------- Description: Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test -Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only needs 2 mins. I found the problem is related to procedure schedule. See the below log: 2020-10-21 13:52:28,097 INFO [PEWorker-1] procedure2.ProcedureExecutor(1427): Finished pid=296, ppid=45, state=SUCCESS; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec 2020-10-21 13:52:28,538 INFO [PEWorker-3] procedure2.ProcedureExecutor(1427): Finished pid=45, ppid=20, state=SUCCESS; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN in 59.4330sec The real assign procedure only cost 1.6 seconds but the TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at 2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. See the below log: {color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] procedure.MasterProcedureScheduler(352): Add TableQueue(testLotsOfRegionRepli2, xlock=true (20) sharedLock=0 size=25) to run queue because: pid=45, ppid=20, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access {color:#ff0000}2020-10-21 13:52:24,761{color} INFO [PEWorker-2] procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN But when I tried this UT on another PC, it only cost 2 mins, which is the same with branch-2.3. It is weird. Marked this as blocker for release 2.2.7. If you are interested for this, please run "mvn clean test -Dtest=TestRegionReplicaFailover" and comment the cost time here. Thanks. was: Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test -Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only needs 2 mins. I found the problem is related to procedure schedule. See the below log: 2020-10-21 13:52:28,097 INFO [PEWorker-1] procedure2.ProcedureExecutor(1427): Finished pid=296, ppid=45, state=SUCCESS; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec 2020-10-21 13:52:28,538 INFO [PEWorker-3] procedure2.ProcedureExecutor(1427): Finished pid=45, ppid=20, state=SUCCESS; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN in 59.4330sec The real assign procedure only cost 1.6 seconds but the TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at 2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. See the below log: {color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] procedure.MasterProcedureScheduler(352): Add TableQueue(testLotsOfRegionRepli2, xlock=true (20) sharedLock=0 size=25) to run queue because: pid=45, ppid=20, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access {color:#ff0000}2020-10-21 13:52:24,761{color} INFO [PEWorker-2] procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=testLotsOfRegionRepli2, region=50703895da3cb8c942d3197600d549bc, ASSIGN But when I tried this UT on another PC, it only cost 2 mins, which is the same with branch-2.3. It is weird. Marked this as blocker for release 2.2.7. > Create table very slowly if there are multi regions > --------------------------------------------------- > > Key: HBASE-25225 > URL: https://issues.apache.org/jira/browse/HBASE-25225 > Project: HBase > Issue Type: Bug > Affects Versions: 2.2.6 > Reporter: Guanghao Zhang > Priority: Blocker > > Run the same UT TestRegionReplicaFailover on my local PC, mvn clean test > -Dtest=TestRegionReplicaFailover, branch-2.2 takes 8 mins but branch-2.3 only > needs 2 mins. > > I found the problem is related to procedure schedule. See the below log: > 2020-10-21 13:52:28,097 INFO [PEWorker-1] > procedure2.ProcedureExecutor(1427): Finished pid=296, ppid=45, state=SUCCESS; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.6250sec > 2020-10-21 13:52:28,538 INFO [PEWorker-3] > procedure2.ProcedureExecutor(1427): Finished pid=45, ppid=20, state=SUCCESS; > TransitRegionStateProcedure table=testLotsOfRegionRepli2, > region=50703895da3cb8c942d3197600d549bc, ASSIGN in 59.4330sec > > The real assign procedure only cost 1.6 seconds but the > TransitRegionStateProcedure cost 59.4 seconds. The pid=45 procedure was > initialized at 2020-10-21 13:51:28,666. It was added to TableQueue at > 2020-10-21 13:51:28,789. But took xlock to run at 2020-10-21 13:52:24,761. > See the below log: > {color:#ff0000}2020-10-21 13:51:28,789{color} DEBUG [PEWorker-4] > procedure.MasterProcedureScheduler(352): Add > TableQueue(testLotsOfRegionRepli2, xlock=true (20) sharedLock=0 size=25) to > run queue because: pid=45, ppid=20, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=testLotsOfRegionRepli2, > region=50703895da3cb8c942d3197600d549bc, ASSIGN has the excusive lock access > {color:#ff0000}2020-10-21 13:52:24,761{color} INFO [PEWorker-2] > procedure.MasterProcedureScheduler(737): Took xlock for pid=45, ppid=20, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=testLotsOfRegionRepli2, > region=50703895da3cb8c942d3197600d549bc, ASSIGN > > But when I tried this UT on another PC, it only cost 2 mins, which is the > same with branch-2.3. It is weird. > > Marked this as blocker for release 2.2.7. > > If you are interested for this, please run "mvn clean test > -Dtest=TestRegionReplicaFailover" and comment the cost time here. Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005)