[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559199#comment-16559199
 ] 

Duo Zhang commented on HBASE-20939:
-----------------------------------

{quote}
So IdLock is about serializing thread access and region lock is about this 
procedure having exclusive lock on the region entity till done. Man, this 
complicated. We should make the framework ensure single-threaded execution. I 
can't think when we'd want different. Its the suspend action that makes life 
interesting
{quote}

This is not easy, I do not have an idea in mind to unify them yet. Actually, 
the region lock is for procedure, and the IdLock is for thread, they are two 
dimensions.

And on the implementation for now, we do allow a procedure to pass the 
exclusive lock check if it has already held the lock, this is necessary, and 
also easy to understand, just the same with the ReentrantLock in java. So it is 
impossible to prevent concurrent execution for the same procedure through the 
procedure lock, for now...

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20939
>                 URL: https://issues.apache.org/jira/browse/HBASE-20939
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
>         Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to