hgromer commented on PR #7084:
URL: https://github.com/apache/hbase/pull/7084#issuecomment-2970866609

   > > > When implementing this SnapshotProcedure, we decided to use shared 
lock and holdLock = true to prevent other table procedure jumps in in the 
middle of our execution while not hurting the availability because exclusive 
lock will also prevent region assigning.
   > > > In [HBASE-28683](https://issues.apache.org/jira/browse/HBASE-28683), 
we introduced a new way to only allow one procedure to run at the same time for 
the same table, so maybe a possible way is to make SnapshotProcedure also 
acquire exclusive lock and set holdLock = false, so it will not be executed at 
the same time with Enable and Disable procedure.
   > > > Thanks.
   > > 
   > > 
   > > That's good to know, thank you for the context. The issue here is that 
the EnableTableProcedure will release the lock after it's subprocedures finish, 
which allows the SnapshotProcedure to execute on a table that is enabling. The 
SnapshotProcedure then gets stuck continuously re-running the same state over 
again, and will refuse to release the lock, creating a deadlock
   > > With all this said, I think it makes the most sense, for simplicity's 
sake, to fail the procedure if the table is in an invalid state.
   > 
   > Using the mechanism in 
[HBASE-28683](https://issues.apache.org/jira/browse/HBASE-28683), 
SnapshotProcedure can not be executed together with 
EnableTableProcedure/DisabledTableProcedure together, which could also solve 
the problem, and I think it is more stable. I'm not sure whether 
ModifyTableProcedure could also affect SnapshotProcedure if it jumps in just in 
the middle of the execution...
   
   Yes, I'm able to avoid the deadlock if the SnapshotProcedure both sets 
`holdLock = true` and also yields after procedure execution. Essentially 
allowing both SnapshotProcedure and EnableTableProcedure to execute without the 
need to throw any sort of yield / suspended exception (which was my initial 
implementation). 
   
   This is certainly a cleaner implementation, though it means we can 
interleave other table procedures after cycles of the SnapshotProcedure. 
   
   You hint at this here
   
   > I'm not sure whether ModifyTableProcedure could also affect 
SnapshotProcedure if it jumps in just in the middle of the execution...
   
   and I believe that it would be problematic because you may snapshot regions 
multiple times, regions may move or go offline, so you may miss out on 
snapshotting regions. 
   
   My initial implementation prevented this by only allowing 
SNAPSHOT_WRITE_SNAPSHOT_INFO to release the lock (when it's still safe to do 
so). However, I don't think it's safe to yield the lock after _any_ cycle. 
   
   After some thought, I think we still have two solutions here:
   
   1. My initial implementation, which throws an exception that indicates we 
should release the lock from within SNAPSHOT_WRITE_SNAPSHOT_INFO
   2. Fail the procedure
   
   It seems the consensus is to avoid complicating the procedure and fail if we 
encounter the table in an invalid state, so I'm leaning towards 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to