[ 
https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541364#comment-16541364
 ] 

Duo Zhang commented on HBASE-20846:
-----------------------------------

OK, found another problem. I want to call acquireLock to restore the lock 
state, but this does work when master restarts. For most procedures, we will 
call env.waitInitialized to wait until master has been initialized. This is 
reasonable. But we need to finish procedure executor initialization before 
loading meta, which means we need to restore the procedure locks before master 
has been initialized, then dead lock...

A proper way to fix this is to split the acquire lock to two stages, and put 
the waitInitialized to the pre check stage. But the logic will be more 
complicated, as it is possible that we haven't passed the pre check, but we 
have already held the lock...

> Table's shared lock is not held by sub-procedures after master restart
> ----------------------------------------------------------------------
>
>                 Key: HBASE-20846
>                 URL: https://issues.apache.org/jira/browse/HBASE-20846
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0, 2.0.2, 2.1.1
>
>         Attachments: HBASE-20846.branch-2.0.002.patch, 
> HBASE-20846.branch-2.0.patch
>
>
> Found this one when investigating ModifyTableProcedure got stuck while there 
> was a MoveRegionProcedure going on after master restart.
> Though this issue can be solved by HBASE-20752. But I discovered something 
> else.
> Before a MoveRegionProcedure can execute, it will hold the table's shared 
> lock. so,, when a UnassignProcedure was spwaned, it will not check the 
> table's shared lock since it is sure that its parent(MoveRegionProcedure) has 
> aquired the table's lock.
> {code:java}
> // If there is parent procedure, it would have already taken xlock, so no 
> need to take
>       // shared lock here. Otherwise, take shared lock.
>       if (!procedure.hasParent()
>           && waitTableQueueSharedLock(procedure, table) == null) {
>           return true;
>       }
> {code}
> But, it is not the case when Master was restarted. The child 
> procedure(UnassignProcedure) will be executed first after restart. Though it 
> has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
> table's lock.
> So, since it began to execute without hold the table's shared lock. A 
> ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
> same time. Which is not possible if the master was not restarted.
> This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, 
> I wrote a simple UT to repo this case.
> I think we don't have to check the parent for table's shared lock. It is a 
> shared lock, right? I think we can acquire it every time we need it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to