[
https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552366#comment-16552366
]
Duo Zhang commented on HBASE-20846:
-----------------------------------
Let me commit to branch-2.1+.
[~stack] For 2.0 I think we could open a backport issue for it?
> Restore procedure locks when master restarts
> --------------------------------------------
>
> Key: HBASE-20846
> URL: https://issues.apache.org/jira/browse/HBASE-20846
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Duo Zhang
> Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch,
> HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch,
> HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch,
> HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch,
> HBASE-20846.patch
>
>
> Found this one when investigating ModifyTableProcedure got stuck while there
> was a MoveRegionProcedure going on after master restart.
> Though this issue can be solved by HBASE-20752. But I discovered something
> else.
> Before a MoveRegionProcedure can execute, it will hold the table's shared
> lock. so,, when a UnassignProcedure was spwaned, it will not check the
> table's shared lock since it is sure that its parent(MoveRegionProcedure) has
> aquired the table's lock.
> {code:java}
> // If there is parent procedure, it would have already taken xlock, so no
> need to take
> // shared lock here. Otherwise, take shared lock.
> if (!procedure.hasParent()
> && waitTableQueueSharedLock(procedure, table) == null) {
> return true;
> }
> {code}
> But, it is not the case when Master was restarted. The child
> procedure(UnassignProcedure) will be executed first after restart. Though it
> has a parent(MoveRegionProcedure), but apprently the parent didn't hold the
> table's lock.
> So, since it began to execute without hold the table's shared lock. A
> ModifyTableProcedure can aquire the table's exclusive lock and execute at the
> same time. Which is not possible if the master was not restarted.
> This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed,
> I wrote a simple UT to repo this case.
> I think we don't have to check the parent for table's shared lock. It is a
> shared lock, right? I think we can acquire it every time we need it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)