[ 
https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuobin zheng updated HBASE-20846:
----------------------------------
    Description: 
Found this one when investigating ModifyTableProcedure got stuck while there 
was a MoveRegionProcedure going on after master restart.
Though this issue can be solved by HBASE-20752. But I discovered something else.
Before a MoveRegionProcedure can execute, it will hold the table's shared lock. 
so,, when a UnassignProcedure was spwaned, it will not check the table's shared 
lock since it is sure that its parent(MoveRegionProcedure) has aquired the 
table's lock.
{code:java}
// If there is parent procedure, it would have already taken xlock, so no need 
to take
      // shared lock here. Otherwise, take shared lock.
      if (!procedure.hasParent()
          && waitTableQueueSharedLock(procedure, table) == null) {
          return true;
      }
{code}
But, it is not the case when Master was restarted. The child 
procedure(UnassignProcedure) will be executed first after restart. Though it 
has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
table's lock.
So, since it began to execute without hold the table's shared lock. A 
ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
same time. Which is not possible if the master was not restarted.
This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, I 
wrote a simple UT to repo this case.

I think we don't have to check the parent for table's shared lock. It is a 
shared lock, right? I think we can acquire it every time we need it.
 

  was:
Found this one when investigating ModifyTableProcedure got stuck while there 
was a MoveRegionProcedure going on after master restart.
Though this issue can be solved by HBASE-20752. But I discovered something else.
Before a MoveRegionProcedure can execute, it will hold the table's shared lock. 
so,, when a UnassignProcedure was spwaned, it will not check the table's shared 
lock since it is sure that its parent(MoveRegionProcedure) has aquired the 
table's lock.
{code:java}
// If there is parent procedure, it would have already taken xlock, so no need 
to take
      // shared lock here. Otherwise, take shared lock.
      if (!procedure.hasParent()
          && waitTableQueueSharedLock(procedure, table) == null) {
          return true;
      }
{code}

But, it is not the case when Master was restarted. The child 
procedure(UnassignProcedure) will be executed first after restart. Though it 
has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
table's lock.
So, since it began to execute without hold the table's shared lock. A 
ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
same time. Which is not possible if the master was not restarted.
This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, I 
wrote a simple UT to repo this case.

I think we don't have to check the parent for table's shared lock. It is a 
shared lock, right? I think we can acquire it every time we need it.


> Restore procedure locks when master restarts
> --------------------------------------------
>
>                 Key: HBASE-20846
>                 URL: https://issues.apache.org/jira/browse/HBASE-20846
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Allan Yang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.2.0, 2.1.1
>
>         Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, 
> HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, 
> HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, 
> HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, 
> HBASE-20846.patch
>
>
> Found this one when investigating ModifyTableProcedure got stuck while there 
> was a MoveRegionProcedure going on after master restart.
> Though this issue can be solved by HBASE-20752. But I discovered something 
> else.
> Before a MoveRegionProcedure can execute, it will hold the table's shared 
> lock. so,, when a UnassignProcedure was spwaned, it will not check the 
> table's shared lock since it is sure that its parent(MoveRegionProcedure) has 
> aquired the table's lock.
> {code:java}
> // If there is parent procedure, it would have already taken xlock, so no 
> need to take
>       // shared lock here. Otherwise, take shared lock.
>       if (!procedure.hasParent()
>           && waitTableQueueSharedLock(procedure, table) == null) {
>           return true;
>       }
> {code}
> But, it is not the case when Master was restarted. The child 
> procedure(UnassignProcedure) will be executed first after restart. Though it 
> has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
> table's lock.
> So, since it began to execute without hold the table's shared lock. A 
> ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
> same time. Which is not possible if the master was not restarted.
> This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, 
> I wrote a simple UT to repo this case.
> I think we don't have to check the parent for table's shared lock. It is a 
> shared lock, right? I think we can acquire it every time we need it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to