[
https://issues.apache.org/jira/browse/HBASE-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin resolved HBASE-21625.
--------------------------------------
Resolution: Cannot Reproduce
Probably a dup of OpenRegionProcedure issues
> a runnable procedure v2 does not run
> ------------------------------------
>
> Key: HBASE-21625
> URL: https://issues.apache.org/jira/browse/HBASE-21625
> Project: HBase
> Issue Type: Bug
> Components: amv2, proc-v2
> Affects Versions: 3.0.0
> Reporter: Sergey Shelukhin
> Priority: Critical
>
> This is on master snapshot as of a few weeks ago.
> Haven't looked at the code much yet, but it seems rather fundamental. The
> procedure comes from meta replica assignment (HBASE-21624), in case it
> matters w.r.t. the engine initialization; however, the master is functional
> and other procedures run fine. I can also see lots of other open region
> procedures with a similar patterns that were initialized before this one and
> have run fine.
> Currently, there are no other runnable procedures on master - a lot of
> succeeded procedures since then, the parent blocked on this procedure, and
> one unrelated RIT procedure waiting with timeout and being updated
> periodically.
> The procedure itself is
> {noformat}
> 157156 157155 RUNNABLE hadoop
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure Wed Dec 19
> 17:20:27 PST 2018 Wed Dec 19 17:20:28 PST 2018 [ { region => {
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline
> => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName
> => 'server1', port => '17020', startCode => '1545266805778' } }, {} ]
> {noformat}
> This is in PST so it's been like that for ~19 hours.
> The only line involving this PID in the log is {noformat}
> 2018-12-19 17:20:27,974 INFO [PEWorker-4] procedure2.ProcedureExecutor:
> Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE,
> hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
> {noformat}
> There are no other useful logs for either this PID, parent PID, or region in
> question since. This PEWorker (4) is also alive and did some work since then,
> so it's not like the thread errored out somewhere.
> All the PEWorker-s are waiting for work:
> {noformat}
> Thread 158 (PEWorker-16):
> State: TIMED_WAITING
> Blocked count: 1340
> Waited count: 5064
> Stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
>
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
> {noformat}
> The main assignment procedure for this region is blocked on it:
> {noformat}
> 157155 WAITING hadoop TransitRegionStateProcedure
> table=hbase:meta, region=534574363, ASSIGN Wed Dec 19 17:20:27 PST 2018
> Wed Dec 19 17:20:27 PST 2018 [ { state => [ '1', '2', '3' ] }, {
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline
> => 'false', split => 'false', replicaId => '1' }, { initialState =>
> 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState =>
> 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName =>
> 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan =>
> 'false' } ]
> 2018-12-19 17:20:27,673 INFO [PEWorker-9]
> procedure.MasterProcedureScheduler: Took xlock for pid=157155,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
> 2018-12-19 17:20:27,809 INFO [PEWorker-9]
> assignment.TransitRegionStateProcedure: Starting pid=157155,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true;
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN;
> rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false,
> retain=false
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)