[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654619#comment-16654619
 ] 

stack commented on HBASE-21291:
-------------------------------

Is it possible that bypass no longer works. IIRC, I could bypass a stuck 
Assign... now it does this but it stays stuck. Says it is bypassed but lock is 
still held:

{code}
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 with lockWait=1000, override=true, 
recursive=true
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0
2018-10-17 21:12:21,260 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2341740, 
state=WAITING:SERVER_CRASH_HANDLE_RIT2, bypass=LOG-REDACTED 
ServerCrashProcedure server=vb1406.halxg.cloudera.com,22101,1539750561781, 
splitWal=true, meta=false
2018-10-17 21:12:21,386 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 and its ancestors successfully, adding 
to queue
{code}

> Add a test for bypassing stuck state-machine procedures
> -------------------------------------------------------
>
>                 Key: HBASE-21291
>                 URL: https://issues.apache.org/jira/browse/HBASE-21291
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Jingyun Tian
>            Assignee: Jingyun Tian
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>       if (!procedure.isFailed()) {
>         if (subprocs != null) {
>           if (subprocs.length == 1 && subprocs[0] == procedure) {
>             // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
>             // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
>             subprocs = null;
>             reExecute = true;
>             LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>           } else {
>             // Yield the current procedure, and make the subprocedure runnable
>             // subprocs may come back 'null'.
>             subprocs = initializeChildren(procStack, procedure, subprocs);
>             LOG.info("Initialized subprocedures=" +
>               (subprocs == null? null:
>                 Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
>                 collect(Collectors.toList()).toString()));
>           }
>         } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>           LOG.debug("Added to timeoutExecutor {}", procedure);
>           timeoutExecutor.add(procedure);
>         } else if (!suspended) {
>           // No subtask, so we are done
>           procedure.setState(ProcedureState.SUCCESS);
>         }
>       }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>           IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>           try {
>             executeProcedure(proc);
>           } catch (AssertionError e) {
>             LOG.info("ASSERT pid=" + proc.getProcId(), e);
>             throw e;
>           } finally {
>             procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
>     IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to