[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21050:
-------------------------------
    Status: Patch Available  (was: Open)

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---------------------------------------------------------------
>
>                 Key: HBASE-21050
>                 URL: https://issues.apache.org/jira/browse/HBASE-21050
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to