hgromer commented on PR #7084:
URL: https://github.com/apache/hbase/pull/7084#issuecomment-3024611037
I think we need to revert this PR. This change can cause another deadlock,
which can interfere with server crash procedures.
The SCP acquires a server exclusive lock, so it can run in parallel to a
snapshot procedure. However the SCP will schedule SplitWALRemoteProcedure which
do acquire table locks. The SplitWALRemoteProcedure won't run until the
snapshot procedure finishes, however the snapshot procedure will get stuck at
state SNAPSHOT_SNAPSHOT_SPLIT_REGIONS waiting for the server to go online.
```
2025-07-01T15:23:20,413 [PEWorker-2] WARN
org.apache.hadoop.hbase.master.procedure.SnapshotRegionProcedure: pid=2365228,
ppid=2365224, state=RUNNABLE, locked=true; SnapshotRegionProcedure
91f810e77abe57ea0791ea6e86ada219 can not run currently because target server of
region
migrate-test-1,\x7F\xFF\xFF\xFE,1751380484623.91f810e77abe57ea0791ea6e86ada219.
na1-elegant-jaded-egg.iad03.hubinternal.net,60020,1751300917328 is in state
SPLITTING, wait 600000 ms to retry
```
This puts us in a state where the children of the SCP will never finish,
which means the SCP will never finish, which also blocks the snapshot procedure
from finishing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]