[
https://issues.apache.org/jira/browse/HBASE-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HBASE-21625:
-------------------------------------
Description:
This is on master snapshot as of a few weeks ago.
Haven't looked at the code much yet, but it seems rather fundamental. The
procedure comes from meta replica assignment (HBASE-21624), in case it matters
w.r.t. the engine initialization; however, the master is functional and other
procedures run fine.
Currently, there are no other runnable procedures on master - a lot of
succeeded procedures since then, the parent blocked on this procedure, and one
unrelated RIT procedure waiting with timeout and being updated periodically.
The procedure itself is
{noformat}
157156 157155 RUNNABLE hadoop
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure Wed Dec 19
17:20:27 PST 2018 Wed Dec 19 17:20:28 PST 2018 [ { region => {
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline =>
'false', split => 'false', replicaId => '1' }, targetServer => { hostName =>
'server1', port => '17020', startCode => '1545266805778' } }, {} ]
{noformat}
This is in PST so it's been like that for ~19 hours.
The only line involving this PID in the log is {noformat}
2018-12-19 17:20:27,974 INFO [PEWorker-4] procedure2.ProcedureExecutor:
Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE,
hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}
There are no other useful logs for either this PID, parent PID, or region in
question since. This PEWorker (4) is also alive and did some work since then,
so it's not like the thread errored out somewhere.
All the PEWorker-s are waiting for work:
{noformat}
Thread 158 (PEWorker-16):
State: TIMED_WAITING
Blocked count: 1340
Waited count: 5064
Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
{noformat}
The main assignment procedure for this region is blocked on it:
{noformat}
157155 WAITING hadoop TransitRegionStateProcedure
table=hbase:meta, region=534574363, ASSIGN Wed Dec 19 17:20:27 PST 2018 Wed
Dec 19 17:20:27 PST 2018 [ { state => [ '1', '2', '3' ] }, {
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline =>
'false', split => 'false', replicaId => '1' }, { initialState =>
'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState =>
'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName =>
'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan =>
'false' } ]
2018-12-19 17:20:27,673 INFO [PEWorker-9] procedure.MasterProcedureScheduler:
Took xlock for pid=157155,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
2018-12-19 17:20:27,809 INFO [PEWorker-9]
assignment.TransitRegionStateProcedure: Starting pid=157155,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true;
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE,
location=server1,17020,1545266805778; forceNewPlan=false, retain=false
{noformat}
was:
This is on master snapshot as of a few weeks ago.
Haven't looked at the code much yet, but it seems rather fundamental. The
procedure comes from meta replica assignment (HBASE-21624), in case it matters
w.r.t. the engine initialization; however, the master is functional and other
procedures run fine.
Currently, there are no other runnable procedures on master - a lot of
succeeded procedures since then, the parent blocked on this procedure, and one
unrelated RIT procedure waiting with timeout and being updated periodically.
The procedure itself is
{noformat}
157156 157155 RUNNABLE hadoop
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure Wed Dec 19
17:20:27 PST 2018 Wed Dec 19 17:20:28 PST 2018 [ { region => {
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline =>
'false', split => 'false', replicaId => '1' }, targetServer => { hostName =>
'server1', port => '17020', startCode => '1545266805778' } }, {} ]
{noformat}
This is in PST so it's been like that for ~19 hours.
The only line involving this PID in the log is {noformat}
2018-12-19 17:20:27,974 INFO [PEWorker-4] procedure2.ProcedureExecutor:
Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE,
hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}
There are no other useful logs for either this PID, parent PID, or region in
question since.
All the PEWorker-s are waiting for work:
{noformat}
Thread 158 (PEWorker-16):
State: TIMED_WAITING
Blocked count: 1340
Waited count: 5064
Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
{noformat}
The main assignment procedure for this region is blocked on it:
{noformat}
157155 WAITING hadoop TransitRegionStateProcedure
table=hbase:meta, region=534574363, ASSIGN Wed Dec 19 17:20:27 PST 2018 Wed
Dec 19 17:20:27 PST 2018 [ { state => [ '1', '2', '3' ] }, {
regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline =>
'false', split => 'false', replicaId => '1' }, { initialState =>
'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState =>
'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName =>
'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan =>
'false' } ]
2018-12-19 17:20:27,673 INFO [PEWorker-9] procedure.MasterProcedureScheduler:
Took xlock for pid=157155,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
2018-12-19 17:20:27,809 INFO [PEWorker-9]
assignment.TransitRegionStateProcedure: Starting pid=157155,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true;
TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE,
location=server1,17020,1545266805778; forceNewPlan=false, retain=false
{noformat}
> a runnable procedure v2 does not run
> ------------------------------------
>
> Key: HBASE-21625
> URL: https://issues.apache.org/jira/browse/HBASE-21625
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Priority: Major
>
> This is on master snapshot as of a few weeks ago.
> Haven't looked at the code much yet, but it seems rather fundamental. The
> procedure comes from meta replica assignment (HBASE-21624), in case it
> matters w.r.t. the engine initialization; however, the master is functional
> and other procedures run fine.
> Currently, there are no other runnable procedures on master - a lot of
> succeeded procedures since then, the parent blocked on this procedure, and
> one unrelated RIT procedure waiting with timeout and being updated
> periodically.
> The procedure itself is
> {noformat}
> 157156 157155 RUNNABLE hadoop
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure Wed Dec 19
> 17:20:27 PST 2018 Wed Dec 19 17:20:28 PST 2018 [ { region => {
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline
> => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName
> => 'server1', port => '17020', startCode => '1545266805778' } }, {} ]
> {noformat}
> This is in PST so it's been like that for ~19 hours.
> The only line involving this PID in the log is {noformat}
> 2018-12-19 17:20:27,974 INFO [PEWorker-4] procedure2.ProcedureExecutor:
> Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE,
> hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
> {noformat}
> There are no other useful logs for either this PID, parent PID, or region in
> question since. This PEWorker (4) is also alive and did some work since then,
> so it's not like the thread errored out somewhere.
> All the PEWorker-s are waiting for work:
> {noformat}
> Thread 158 (PEWorker-16):
> State: TIMED_WAITING
> Blocked count: 1340
> Waited count: 5064
> Stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
>
> org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
> {noformat}
> The main assignment procedure for this region is blocked on it:
> {noformat}
> 157155 WAITING hadoop TransitRegionStateProcedure
> table=hbase:meta, region=534574363, ASSIGN Wed Dec 19 17:20:27 PST 2018
> Wed Dec 19 17:20:27 PST 2018 [ { state => [ '1', '2', '3' ] }, {
> regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline
> => 'false', split => 'false', replicaId => '1' }, { initialState =>
> 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState =>
> 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName =>
> 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan =>
> 'false' } ]
> 2018-12-19 17:20:27,673 INFO [PEWorker-9]
> procedure.MasterProcedureScheduler: Took xlock for pid=157155,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
> 2018-12-19 17:20:27,809 INFO [PEWorker-9]
> assignment.TransitRegionStateProcedure: Starting pid=157155,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true;
> TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN;
> rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false,
> retain=false
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)