[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21364:
-------------------------------
    Attachment: HBASE-21364.branch-2.0.002.patch

> Procedure holds the lock should put to front of the queue after restart
> -----------------------------------------------------------------------
>
>                 Key: HBASE-21364
>                 URL: https://issues.apache.org/jira/browse/HBASE-21364
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0, 2.0.2
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to