[
https://issues.apache.org/jira/browse/HBASE-21384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allan Yang updated HBASE-21384:
-------------------------------
Attachment: HBASE-21384.branch-2.0.003.patch
> Procedure with holdlock=false should not be restored lock when restarts
> ------------------------------------------------------------------------
>
> Key: HBASE-21384
> URL: https://issues.apache.org/jira/browse/HBASE-21384
> Project: HBase
> Issue Type: Sub-task
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21384.branch-2.0.001.patch,
> HBASE-21384.branch-2.0.002.patch, HBASE-21384.branch-2.0.003.patch
>
>
> Yet another case of stuck similar with HBASE-21364.
> The case is that:
> 1. A ModifyProcedure spawned a ReopenTableProcedure, and since its
> holdLock=false, so it release the lock
> 2. The ReopenTableProcedure spawned several MoveRegionProcedure, it also has
> holdLock=false, but just after it store the children procedures to the wal
> and begin to release the lock, the master was killed.
> 3. When restarting, the ReopenTableProcedure's lock was restored (since it
> was hold the lock before, which is not right, since it is in WAITING state
> now and its holdLock=false)
> 4. After restart, MoveRegionProcedure can execute since its parent has the
> lock, but when it spawned the AssignProcedure, the AssignProcedure procedure
> can't execute anymore, since it parent didn't have the lock, but its
> 'grandpa' - ReopenTableProcedure has.
> 5. Restart the master, the stuck still, because we will restore the lock for
> ReopenTableProcedure.
> Two fixes:
> 1. We should not restore the lock if the procedure doesn't hold lock and in
> WAITING state.
> 2. Procedures don't have lock but its parent has the lock should also be put
> in front of the queue, as a addendum of HBASE-21364.
> Discussion:
> Should we check the lock of all ancestors not only its parents? As addressed
> in the comments of the patch, currently, after fix the issue above, check
> parent is enough.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)