Duo Zhang created HBASE-20939:
---------------------------------

             Summary: There will be race when we call suspendIfNotReady and 
then throw ProcedureSuspendedException
                 Key: HBASE-20939
                 URL: https://issues.apache.org/jira/browse/HBASE-20939
             Project: HBase
          Issue Type: Sub-task
            Reporter: Duo Zhang


This is very typical usage in our procedure implementation, for example, in 
AssignProcedure, we will call AM.queueAssign and then suspend ourselves to wait 
until the AM finish processing our assign request.

But there could be races. Think of this:
1. We call suspendIfNotReady on a event, and it returns true so we need to wait.
2. The event has been waked up, and the procedure will be added back to the 
scheduler.
3. A worker picks up the procedure and finishes it.
4. We finally throw ProcedureSuspendException and the ProcedureExecutor suspend 
us and store the state in procedure store.

So we have a half done procedure in the procedure store for ever... This may 
cause assertion when loading procedures. And maybe the worker can not finish 
the procedure as when suspending we need to restore some state, for example, 
add something to RootProcedureState. But anyway, it will still lead to 
assertion or other unexpected errors.

And this can not be done by simply adding a lock in the procedure, as most 
works are done in the ProcedureExecutor after we throw 
ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to