[
https://issues.apache.org/jira/browse/HBASE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019209#comment-18019209
]
Viraj Jasani commented on HBASE-28158:
--------------------------------------
In case of SCP, instead of RITs, how about we consider using
"inconsistentRegions" metric i.e. increment it by the number of regions on the
target regionserver? As soon as TRSPs are completed, we decrement
inconsistentRegions by the number of regions on the target regionserver.
With RIT comes RITOldestAge, which is determined by last update time in meta
(as Umesh mentioned above).
Both RITs and inconsistencies are used as high priority observability metrics.
> Decouple RIT list management from TRSP invocation
> -------------------------------------------------
>
> Key: HBASE-28158
> URL: https://issues.apache.org/jira/browse/HBASE-28158
> Project: HBase
> Issue Type: Bug
> Components: master, Region Assignment
> Affects Versions: 2.6.0, 2.5.6, 3.0.0-beta-1
> Reporter: Andrew Kyle Purtell
> Priority: Major
> Labels: pull-request-available
>
> Operators bypassed some in progress TRSPs leading to a state where some
> regions were persistently in transition but hidden. Because the master builds
> its list of regions in transition by tracking TRSP, the bypass of TRSP
> removed the regions from the RIT list.
> Although I can see from reading the code this is the expected behavior, it is
> surprising for operators and should be changed. Operators expect that regions
> that should be open but are not appear the master's RIT list, provided by
> /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
> We should only remove a region from the RIT map when assignment reaches a
> suitable terminal state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)