[
https://issues.apache.org/jira/browse/HBASE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HBASE-21757:
-------------------------------------
Attachment: (was: image-2019-01-22-11-00-37-578.png)
> retrying to close a region incorrectly resets its RIT age metric
> ----------------------------------------------------------------
>
> Key: HBASE-21757
> URL: https://issues.apache.org/jira/browse/HBASE-21757
> Project: HBase
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Sergey Shelukhin
> Priority: Major
>
> We have a region stuck in RIT forever due to some other bug that I will file
> later.
> Every 10 minutes it does the typical split-brain retry; I noticed that this
> retry resets the region's RIT age, so the "oldest RIT" metric never becomes
> larger than ~10mins even though the region has been stuck for days.
> {noformat}
> 2019-01-22 10:40:52,993 INFO [PEWorker-10] assignment.RegionStateStore:
> pid=1865 updating hbase:meta row=region, regionState=CLOSING,
> regionLocation=server,17020,1547824687684
> 2019-01-22 10:40:53,025 WARN [PEWorker-10]
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=29297,
> ppid=1865, state=RUNNABLE, hasLock=true;
> org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure for region
> {ENCODED => region, ...} to server server,17020,1547824687684, this usually
> because the server is alread dead, give up and mark the procedure as
> complete, the parent procedure will take care of this.
> 2019-01-22 10:40:53,040 INFO [PEWorker-10] procedure2.ProcedureExecutor:
> Finished subprocedure(s) of pid=1865,
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_CLOSED, hasLock=true;
> TransitRegionStateProcedure table=table, region=region, REOPEN/MOVE; resume
> parent processing.
> 2019-01-22 10:40:53,040 WARN [PEWorker-7]
> assignment.TransitRegionStateProcedure: Failed transition, suspend 600secs
> pid=1865, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE, hasLock=true;
> TransitRegionStateProcedure table=table, region=region, REOPEN/MOVE;
> rit=CLOSING, location=server,17020,1547824687684; waiting on rectified
> condition fixed by other Procedure or operator intervention
> 2019-01-22 10:40:53,040 INFO [PEWorker-7] procedure2.TimeoutExecutorThread:
> ADDED pid=1865, state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_CLOSE,
> hasLock=true; TransitRegionStateProcedure table=table, region=region,
> REOPEN/MOVE; timeout=600000, timestamp=1548183053040
> {noformat}
> !image-2019-01-22-11-00-39-030.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)