[
https://issues.apache.org/jira/browse/HBASE-21787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752867#comment-16752867
]
Duo Zhang commented on HBASE-21787:
-----------------------------------
The assumption here is that, a region can only have one TRSP at the same time.
And we will unset the TRSP when finishing a TRSP, so it is possible that after
we unset the TRSP, and before we actually finish the TRSP, a new TRSP has been
scheduled. This will lead to two TRSPs when restarting.
But obviously, things are a bit strange here. For the above scenario, the old
TRSP must be in RUNNABLE state, so it can finish itself. But here it is in
WAITING state...
Need to find out why. Do you have logs before restarting?
> proc WAL replaces a RIT that holds a lock with a RIT that doesn't
> -----------------------------------------------------------------
>
> Key: HBASE-21787
> URL: https://issues.apache.org/jira/browse/HBASE-21787
> Project: HBase
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Sergey Shelukhin
> Priority: Critical
>
> This is not the same as HBASE-21786, but related - after master restart, 2
> RITs are both in proc WAL. According to the comment where RIT is restored,
> this is expected.
> However what happens is that master takes lock for the older RIT, and then
> replaces the older RIT with the newer RIT on the region.
> You can see two "to restore RIT" log lines.
> Both RITs are still active in procedures view (and stuck due to yet another
> bug that I will file later). However, it seems wrong that lock is held by one
> RIT but region points to the other RIT as the correct one.
> {noformat}
> 2019-01-25 11:26:54,616 INFO [master/master:17000:becomeActiveMaster]
> procedure.MasterProcedureScheduler: Took xlock for pid=1738, ppid=3,
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false;
> TransitRegionStateProcedure table=table,
> region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN
> 2019-01-25 11:26:54,834 INFO [master/master:17000:becomeActiveMaster]
> assignment.AssignmentManager: Attach pid=1738, ppid=3,
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false;
> TransitRegionStateProcedure table=table,
> region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE,
> location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to
> restore RIT
> 2019-01-25 11:26:54,853 INFO [master/master:17000:becomeActiveMaster]
> assignment.AssignmentManager: Attach pid=4351,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
> TransitRegionStateProcedure table=table,
> region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE,
> location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to
> restore RIT
> 2019-01-25 11:27:02,460 INFO [master/master:17000:becomeActiveMaster]
> assignment.RegionStateStore: Load hbase:meta entry
> region=27f7ab2a05d9d730b2ab2339d1531b8e, regionState=OPENING,
> lastHost=server1,17020,1548290445704,
> regionLocation=server2,17020,1548442571056, openSeqNum=120108
> 2019-01-25 11:27:10,184 INFO [PEWorker-11]
> procedure.MasterProcedureScheduler: Waiting on xlock for pid=4351,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
> TransitRegionStateProcedure table=table,
> region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN held by pid=1738
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)