[ 
https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540152#comment-16540152
 ] 

Duo Zhang commented on HBASE-20846:
-----------------------------------

What I want to do is that, add a field called 'locked' to the base Procedure 
class, which is used to record the acquire/release of the procedure lock, and 
it will be serialized in the protobuf message. It will be only used for 
updating to and loading from ProcedureStore. There is already a 'hasLock' 
method which is supposed to be used together with holdLock, so we need to add 
clear document to say the difference between these two methods/fields.

When acquiring lock before executing the procedure returns LOCK_ACQUIRED, we 
will set the locked flag to true, and make ProcedureStore.update call to record 
that we have already held the lock, before actually executing the procedure. 
And after executing one step of the procedure, we will decide whether we need 
to release the lock. If so, we will set the locked flag to false and make a 
ProcedureStore.update call to record that we have already released the lock(may 
not be needed, as later we will do a update, but here the problem is that we 
need to make sure the update comes before the actual release of the lock).

And when loading, we will call acquireLock of a procedure according to the 
locked flag. And when executing a procedure, if the locked flag is true, then 
we do not need to call acquireLock any more.

> Table's shared lock is not held by sub-procedures after master restart
> ----------------------------------------------------------------------
>
>                 Key: HBASE-20846
>                 URL: https://issues.apache.org/jira/browse/HBASE-20846
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0, 2.0.2, 2.1.1
>
>         Attachments: HBASE-20846.branch-2.0.002.patch, 
> HBASE-20846.branch-2.0.patch
>
>
> Found this one when investigating ModifyTableProcedure got stuck while there 
> was a MoveRegionProcedure going on after master restart.
> Though this issue can be solved by HBASE-20752. But I discovered something 
> else.
> Before a MoveRegionProcedure can execute, it will hold the table's shared 
> lock. so,, when a UnassignProcedure was spwaned, it will not check the 
> table's shared lock since it is sure that its parent(MoveRegionProcedure) has 
> aquired the table's lock.
> {code:java}
> // If there is parent procedure, it would have already taken xlock, so no 
> need to take
>       // shared lock here. Otherwise, take shared lock.
>       if (!procedure.hasParent()
>           && waitTableQueueSharedLock(procedure, table) == null) {
>           return true;
>       }
> {code}
> But, it is not the case when Master was restarted. The child 
> procedure(UnassignProcedure) will be executed first after restart. Though it 
> has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
> table's lock.
> So, since it began to execute without hold the table's shared lock. A 
> ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
> same time. Which is not possible if the master was not restarted.
> This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, 
> I wrote a simple UT to repo this case.
> I think we don't have to check the parent for table's shared lock. It is a 
> shared lock, right? I think we can acquire it every time we need it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to