Kiran Kumar Maturi created HBASE-30143:
------------------------------------------
Summary: ProcedureExecutor orphans FAILED procedures with
holdLock=true when setRollback() races with child release()
Key: HBASE-30143
URL: https://issues.apache.org/jira/browse/HBASE-30143
Project: HBase
Issue Type: Bug
Components: proc-v2, Region Assignment
Affects Versions: 2.5.14, 2.6.5
Environment: Any HBase deployment running splits/merges or other
StateMachineProcedures with holdLock()==true under concurrent worker load
Reporter: Kiran Kumar Maturi
Assignee: Kiran Kumar Maturi
h3. Summary
{\{ProcedureExecutor.executeProcedure()}} can leave a
\{{StateMachineProcedure}} with {\{holdLock()==true}} in an orphaned
state: \{{ProcedureState.FAILED}}, exclusive lock held, and not present on any
scheduler queue. No event ever re-awakens it; the only recovery is master
failover (via \{{loadProcedures() -> failedList.forEach(scheduler::addBack)}}).
In production we observed this as an HBase region stuck CLOSED for 5h 37m after
a {\{SplitTableRegionProcedure}} hit "Recovered.edits are found"
during
{\{SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS}}. The region was completely
unavailable to clients for the entire duration. Master failover released the
lock and rollback finally ran.
Race between two workers when a parent procedure calls {{setFailure()}} while a
sibling/child
procedure has not yet returned from {{procStack.release()}}.
Relevant code paths (numbers from branch-2.6):
* {{ProcedureExecutor.executeProcedure()}} lines 1414-1489 — outer do-while
loop.
* {{RootProcedureState.setRollback()}} line 85 — guarded by {{running == 0 &&
state == FAILED}}.
* {{RootProcedureState.acquire()}} line 138 — increments {{running}};
{{release()}} at 150 decrements.
* {{ProcedureExecutor.releaseLock()}} line 1502-1509 — skips release when
{{proc.holdLock(env)==true && !proc.isFinished()}}. {{isFinished()}} is
only true for
SUCCESS/ROLLEDBACK, NOT for FAILED.
Timeline of the race:
|| T || Worker-A (child) || Worker-B (parent) || running || state ||
| 0 | acquire(child) | — | 1 | RUNNING |
| 1 | child execute returns SUCCESS | — | 1 | RUNNING |
| 2 | countDownChildren → scheduler.addFront(parent) | — | 1 | RUNNING |
| 3 | — | picks up parent | 1 | RUNNING |
| 4 | — | acquire(parent) | 2 | RUNNING |
| 5 | — | executeFromState throws, setFailure() | 2 | FAILED |
| 6 | — | execProcedure returns | 2 | FAILED |
| 7 | — | do-while re-enters, acquire() returns false | 2 | FAILED |
| 8 | — | setRollback() returns false (running != 0) | 2 | FAILED |
| 9 | — | else-branch, wasExecuted()==true, break; | 2 | FAILED |
| 10 | release(child) | Worker-B returns | 1 | FAILED |
From T+10: procedure is FAILED, {{holdLock=true}} prevented {{releaseLock()}}
at T+6 from
releasing the xlock, and nothing re-enqueues the root. The child's
{{countDownChildren}} wake-up was consumed at T+3 and there is no further
event generator.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)