[ 
https://issues.apache.org/jira/browse/HBASE-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21625:
-------------------------------------
    Description: 
This is on master as of a few weeks ago.
Haven't looked at the code much yet, but it seems rather fundamental. The 
procedure comes from meta replica assignment (HBASE-21624), in case it matters 
w.r.t. the engine initialization; however, the master is functional and other 
procedures run fine.
Currently, there are no other runnable procedures on master - a lot of 
succeeded procedures since then, the parent blocked on this procedure, and one 
unrelated RIT procedure waiting with timeout and being updated periodically.

The procedure itself is 
{noformat}
157156  157155  RUNNABLE        hadoop  
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure   Wed Dec 19 
17:20:27 PST 2018    Wed Dec 19 17:20:28 PST 2018            [ { region => { 
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 
'false', split => 'false', replicaId => '1' }, targetServer => { hostName => 
'server1', port => '17020', startCode => '1545266805778' } }, {} ]
{noformat}
This is in PST so it's been like that for ~19 hours.

The only line involving this PID in the log is {noformat}
2018-12-19 17:20:27,974 INFO  [PEWorker-4] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, 
hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}
There are no other useful logs for either this PID, parent PID, or region in 
question since.

All the PEWorker-s are waiting for work:
{noformat}
Thread 158 (PEWorker-16):
  State: TIMED_WAITING
  Blocked count: 1340
  Waited count: 5064
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
    
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
    
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
{noformat}

The main assignment procedure for this region is blocked on it:
{noformat}
157155          WAITING         hadoop  TransitRegionStateProcedure 
table=hbase:meta, region=534574363, ASSIGN  Wed Dec 19 17:20:27 PST 2018    Wed 
Dec 19 17:20:27 PST 2018            [ { state => [ '1', '2', '3' ] }, { 
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 
'false', split => 'false', replicaId => '1' }, { initialState => 
'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => 
'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => 
'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => 
'false' } ]


2018-12-19 17:20:27,673 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: 
Took xlock for pid=157155, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; 
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
2018-12-19 17:20:27,809 INFO  [PEWorker-9] 
assignment.TransitRegionStateProcedure: Starting pid=157155, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE, 
location=server1,17020,1545266805778; forceNewPlan=false, retain=false
{noformat}


  was:
This is on master as of a few weeks ago.
Haven't looked at the code much yet, but it seems rather fundamental. The 
procedure comes from meta replica assignment (HBASE-21624).
There are no other runnable procedures on master - a lot of succeeded 
procedures since then, and one unrelated RIT procedure waiting with timeout and 
being updated peroodically.

The procedure itself is 
{noformat}
157156  157155  RUNNABLE        hadoop  
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure   Wed Dec 19 
17:20:27 PST 2018    Wed Dec 19 17:20:28 PST 2018            [ { region => { 
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 
'false', split => 'false', replicaId => '1' }, targetServer => { hostName => 
'server1', port => '17020', startCode => '1545266805778' } }, {} ]
{noformat}
This is in PST so it's been like that for ~19 hours.

The only line involving this PID in the log is {noformat}
2018-12-19 17:20:27,974 INFO  [PEWorker-4] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, 
hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}
There are no other useful logs for either this PID, parent PID, or region in 
question since.

All the PEWorker-s are waiting for work:
{noformat}
Thread 158 (PEWorker-16):
  State: TIMED_WAITING
  Blocked count: 1340
  Waited count: 5064
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
    
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
    
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
{noformat}

The main assignment procedure for this region is blocked on it:
{noformat}
157155          WAITING         hadoop  TransitRegionStateProcedure 
table=hbase:meta, region=534574363, ASSIGN  Wed Dec 19 17:20:27 PST 2018    Wed 
Dec 19 17:20:27 PST 2018            [ { state => [ '1', '2', '3' ] }, { 
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 
'false', split => 'false', replicaId => '1' }, { initialState => 
'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => 
'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => 
'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => 
'false' } ]


2018-12-19 17:20:27,673 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: 
Took xlock for pid=157155, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; 
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
2018-12-19 17:20:27,809 INFO  [PEWorker-9] 
assignment.TransitRegionStateProcedure: Starting pid=157155, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE, 
location=server1,17020,1545266805778; forceNewPlan=false, retain=false
{noformat}



> a runnable procedure v2 does not run
> ------------------------------------
>
>                 Key: HBASE-21625
>                 URL: https://issues.apache.org/jira/browse/HBASE-21625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> This is on master as of a few weeks ago.
> Haven't looked at the code much yet, but it seems rather fundamental. The 
> procedure comes from meta replica assignment (HBASE-21624), in case it 
> matters w.r.t. the engine initialization; however, the master is functional 
> and other procedures run fine.
> Currently, there are no other runnable procedures on master - a lot of 
> succeeded procedures since then, the parent blocked on this procedure, and 
> one unrelated RIT procedure waiting with timeout and being updated 
> periodically.
> The procedure itself is 
> {noformat}
> 157156        157155  RUNNABLE        hadoop  
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure   Wed Dec 19 
> 17:20:27 PST 2018    Wed Dec 19 17:20:28 PST 2018            [ { region => { 
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline 
> => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName 
> => 'server1', port => '17020', startCode => '1545266805778' } }, {} ]
> {noformat}
> This is in PST so it's been like that for ~19 hours.
> The only line involving this PID in the log is {noformat}
> 2018-12-19 17:20:27,974 INFO  [PEWorker-4] procedure2.ProcedureExecutor: 
> Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, 
> hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
> {noformat}
> There are no other useful logs for either this PID, parent PID, or region in 
> question since.
> All the PEWorker-s are waiting for work:
> {noformat}
> Thread 158 (PEWorker-16):
>   State: TIMED_WAITING
>   Blocked count: 1340
>   Waited count: 5064
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>     
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
>     
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
>     
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
> {noformat}
> The main assignment procedure for this region is blocked on it:
> {noformat}
> 157155                WAITING         hadoop  TransitRegionStateProcedure 
> table=hbase:meta, region=534574363, ASSIGN  Wed Dec 19 17:20:27 PST 2018    
> Wed Dec 19 17:20:27 PST 2018            [ { state => [ '1', '2', '3' ] }, { 
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline 
> => 'false', split => 'false', replicaId => '1' }, { initialState => 
> 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => 
> 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => 
> 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => 
> 'false' } ]
> 2018-12-19 17:20:27,673 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: Took xlock for pid=157155, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; 
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
> 2018-12-19 17:20:27,809 INFO  [PEWorker-9] 
> assignment.TransitRegionStateProcedure: Starting pid=157155, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; 
> rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false, 
> retain=false
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to