[ https://issues.apache.org/jira/browse/HBASE-21384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang updated HBASE-21384: ------------------------------ Fix Version/s: 2.0.3 2.1.1 2.2.0 3.0.0 > Procedure with holdlock=false should not be restored lock when restarts > ------------------------------------------------------------------------ > > Key: HBASE-21384 > URL: https://issues.apache.org/jira/browse/HBASE-21384 > Project: HBase > Issue Type: Sub-task > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21384.branch-2.0.001.patch, > HBASE-21384.branch-2.0.002.patch > > > Yet another case of stuck similar with HBASE-21364. > The case is that: > 1. A ModifyProcedure spawned a ReopenTableProcedure, and since its > holdLock=false, so it release the lock > 2. The ReopenTableProcedure spawned several MoveRegionProcedure, it also has > holdLock=false, but just after it store the children procedures to the wal > and begin to release the lock, the master was killed. > 3. When restarting, the ReopenTableProcedure's lock was restored (since it > was hold the lock before, which is not right, since it is in WAITING state > now and its holdLock=false) > 4. After restart, MoveRegionProcedure can execute since its parent has the > lock, but when it spawned the AssignProcedure, the AssignProcedure procedure > can't execute anymore, since it parent didn't have the lock, but its > 'grandpa' - ReopenTableProcedure has. > 5. Restart the master, the stuck still, because we will restore the lock for > ReopenTableProcedure. > Two fixes: > 1. We should not restore the lock if the procedure doesn't hold lock and in > WAITING state. > 2. Procedures don't have lock but its parent has the lock should also be put > in front of the queue, as a addendum of HBASE-21364. > Discussion: > Should we check the lock of all ancestors not only its parents? As addressed > in the comments of the patch, currently, after fix the issue above, check > parent is enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)