[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661933#comment-16661933 ]
Hudson commented on HBASE-21364: -------------------------------- Results for branch branch-2.1 [build #523 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/523/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/523//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/523//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/523//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Procedure holds the lock should put to front of the queue after restart > ----------------------------------------------------------------------- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.1.0, 2.0.2 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)