[
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654628#comment-16654628
]
stack commented on HBASE-21291:
-------------------------------
So, if override is set, make the waitTime some nominal amount -- say 10ms? This
way we wait on the lock for a little while but will proceed after 10ms even if
we don't get the lock?
So, doing bypass on an AssignProcedure, I see that it reports this:
2424243 2341740 RUNNABLE(Bypass)
... but it still has exclusive lock held.
{code}
REGION: 2651cb48574979f2dccc64e3c02ad5e0
Lock type: EXCLUSIVE
Owner procedure: { ID => '2424243', PARENT_ID => '2341740', STATE =>
'RUNNABLE', OWNER => 'hbase', TYPE => 'AssignProcedure
table=IntegrationTestBigLinkedList_20180815104044,
region=2651cb48574979f2dccc64e3c02ad5e0', START_TIME => 'Wed Oct 17 12:41:37
PDT 2018', LAST_UPDATE => 'Wed Oct 17 15:29:53 PDT 2018', PARAMETERS => [ {
transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId =>
'1534357375831', tableName => { namespace => 'ZGVmYXVsdA==', qualifier =>
'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdF8yMDE4MDgxNTEwNDA0NA==' }, startKey =>
'Np+lMnWysqQ=', endKey => 'NqGeiPHP0F8=', offline => 'false', split => 'false',
replicaId => '0' } } ] }
{code}
Maybe the lock is held because we are not running running this Procedure... PE
is jammed up?
> Add a test for bypassing stuck state-machine procedures
> -------------------------------------------------------
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Jingyun Tian
> Assignee: Jingyun Tian
> Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch,
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch,
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
> if (!procedure.isFailed()) {
> if (subprocs != null) {
> if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}",
> procedure.getProcId());
> } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
> (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
> }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
> LOG.debug("Added to timeoutExecutor {}", procedure);
> timeoutExecutor.add(procedure);
> } else if (!suspended) {
> // No subtask, so we are done
> procedure.setState(ProcedureState.SUCCESS);
> }
> }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true
> for state machine like procedure. Then if this procedure is stuck at one
> certain state, it will loop forever.
> {code}
> IdLock.Entry lockEntry =
> procExecutionLock.getLockEntry(proc.getProcId());
> try {
> executeProcedure(proc);
> } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
> } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done,
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at
> first.
> {code}
> IdLock.Entry lockEntry =
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)