[
https://issues.apache.org/jira/browse/HBASE-23613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002679#comment-17002679
]
Lijin Bin commented on HBASE-23613:
-----------------------------------
This is a tmp state, org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
will finally timeout and release the region state lock,but the time will be
more than 15mins.
{code}
2019-12-23 16:03:44,264 INFO [KeepAlivePEWorker-76]
procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=2267470,
ppid=2267468, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN}]
2019-12-23 16:21:19,353 DEBUG [PEWorker-16] procedure.MasterProcedureScheduler:
Remove TableQueue(hbase:meta, xlock=false sharedLock=0 size=0) from run queue
because: queue is empty after polling out pid=2267470, ppid=2267468,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2019-12-23 16:21:19,353 INFO [PEWorker-16] procedure.MasterProcedureScheduler:
Took xlock for pid=2267470, ppid=2267468,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2019-12-23 16:21:19,425 INFO [PEWorker-16]
assignment.TransitRegionStateProcedure: Starting pid=2267470, ppid=2267468,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN;
rit=OPEN, location=null; forceNewPlan=true, retain=false
2019-12-23 16:21:25,895 INFO [PEWorker-6] procedure2.ProcedureExecutor:
Finished pid=2267470, ppid=2267468, state=SUCCESS; TransitRegionStateProcedure
table=hbase:meta, region=1588230740, ASSIGN in 17mins, 41.329sec
{code}
> ProcedureExecutor check StuckWorkers blocked by DeadServerMetricRegionChore
> ---------------------------------------------------------------------------
>
> Key: HBASE-23613
> URL: https://issues.apache.org/jira/browse/HBASE-23613
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.2.2
> Reporter: Lijin Bin
> Assignee: Lijin Bin
> Priority: Major
>
> After debuging, i find WorkerMonitor in ProcedureExecutor do not execute for
> a while because it is blocked by DeadServerMetricRegionChore.
> TimeoutExecutorThread execute not only WorkerMonitor, but also
> DeadServerMetricRegionChore RegionInTransitionChore...
> {code}
> "ProcExecTimeout" #1052 daemon prio=5 os_prio=0 tid=0x00007f5c98cc4000
> nid=0x229 waiting on condition [0x00007f5c2f857000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000005c312ad80> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateNode.lock(RegionStateNode.java:313)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1186)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1163)
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(TimeoutExecutorThread.java:120)
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:99)
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:66)
> "PEWorker-1" #1053 daemon prio=5 os_prio=0 tid=0x00007f5c98cc5800 nid=0x22a
> in Object.wait() [0x00007f5c2f756000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:168)
> - locked <0x00000005839f18b0> (a
> java.util.concurrent.atomic.AtomicBoolean)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:1742)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:298)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)