[
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559091#comment-16559091
]
Duo Zhang commented on HBASE-20949:
-----------------------------------
The problem here is that, we seem to finish the procedure twice.
{noformat}
2018-07-26 22:35:44,762 INFO [PEWorker-4] procedure2.ProcedureExecutor(1551):
Initialized subprocedures=[{pid=96, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=d0b0215a67002d7c19486ee75f610d94}, {pid=97, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=9ae1ea9ba045c13d80b255e0de39eafb}, {pid=98, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=46cd0014b889f5540cb5e356c5fb7d6e}, {pid=99, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=201c3520780fedb059b26f6d757c250f}, {pid=100, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=c5e9ddc19cbd286cf4a016f889e79732}, {pid=101, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251}]
2018-07-26 22:35:44,819 DEBUG [PEWorker-8] procedure2.LockAndQueue(123):
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false;
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 acquire shared lock 3284bcdf:
exclusiveLockOwner=NONE, sharedLockCount=1, waitingProcCount=0 succeeded
2018-07-26 22:35:44,819 DEBUG [PEWorker-8] procedure2.LockAndQueue(123):
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false;
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 acquire shared lock 7f6eec32:
exclusiveLockOwner=95, sharedLockCount=0, waitingProcCount=0 succeeded
2018-07-26 22:35:44,819 INFO [PEWorker-8]
procedure.MasterProcedureScheduler(689): pid=101, ppid=95,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 checking lock on
e1d6b5e83f6ed6175d95b0a628627251
2018-07-26 22:35:44,871 INFO [PEWorker-8] assignment.AssignProcedure(218):
Starting pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE,
hasLock=true; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OFFLINE,
location=asf911.gq1.ygridcore.net,41392,1532644489849; forceNewPlan=false,
retain=true
2018-07-26 22:35:45,022 INFO [PEWorker-5] assignment.RegionStateStore(199):
pid=101 updating hbase:meta row=e1d6b5e83f6ed6175d95b0a628627251,
regionState=OPENING, regionLocation=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,026 INFO [PEWorker-5]
assignment.RegionTransitionProcedure(241): Dispatch pid=101, ppid=95,
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING,
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,198 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=57564]
assignment.RegionTransitionProcedure(264): Received report OPENED seqId=5,
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true;
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING,
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,198 DEBUG [PEWorker-15]
assignment.RegionTransitionProcedure(354): Finishing pid=101, ppid=95,
state=RUNNABLE:REGION_TRANSITION_FINISH, hasLock=true; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING,
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,199 INFO [PEWorker-15] assignment.RegionStateStore(199):
pid=101 updating hbase:meta row=e1d6b5e83f6ed6175d95b0a628627251,
regionState=OPEN, openSeqNum=5,
regionLocation=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,276 DEBUG [PEWorker-5] procedure2.LockAndQueue(132):
pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 release shared lock 7f6eec32:
exclusiveLockOwner=NONE, sharedLockCount=6, waitingProcCount=0
2018-07-26 22:35:45,276 DEBUG [PEWorker-5] procedure2.LockAndQueue(132):
pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 release shared lock 3284bcdf:
exclusiveLockOwner=NONE, sharedLockCount=6, waitingProcCount=0
2018-07-26 22:35:45,277 INFO [PEWorker-5] procedure2.ProcedureExecutor(1307):
Finished pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 in 439msec
2018-07-26 22:35:45,543 DEBUG [PEWorker-15] procedure2.LockAndQueue(132):
pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 release shared lock 7f6eec32:
exclusiveLockOwner=NONE, sharedLockCount=4, waitingProcCount=0
2018-07-26 22:35:45,544 DEBUG [PEWorker-15] procedure2.LockAndQueue(132):
pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 release shared lock 3284bcdf:
exclusiveLockOwner=NONE, sharedLockCount=4, waitingProcCount=0
2018-07-26 22:35:45,544 INFO [PEWorker-15] procedure2.ProcedureExecutor(1307):
Finished pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure
table=testRestoreSnapshotAfterSplittingRegions-1532644538441,
region=e1d6b5e83f6ed6175d95b0a628627251 in 439msec
{noformat}
The PEWorker-5 and PEWORKER-15 both finish the procedure causes the double
release.
Anyway let me revert the debug commit first as it prints too many logs.
> Split/Merge table can be executed concurrently with DisableTableProcedure
> -------------------------------------------------------------------------
>
> Key: HBASE-20949
> URL: https://issues.apache.org/jira/browse/HBASE-20949
> Project: HBase
> Issue Type: Sub-task
> Reporter: Duo Zhang
> Priority: Major
> Attachments: HBASE-20949-debug.patch
>
>
> The top flaky tests on the dashboard are all because of this.
> TestRestoreSnapshotFromClient
> TestSimpleRegionNormalizerOnCluster
> Theoretically this should not happen, need to dig more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)