HBase 2 changed how the RIT metric and RIT over threshold metrics are calculated. Previously in HBase 1 it was calculated by looking at assignment state. After the introduction of the AMv2 instead the RIT related metrics are tied to whether a TRSP is executing or not. The problem is, there is not always a correspondence between the two things, in cases of bugs or operator administrative activity or error, leading to states where a region can be offline but should be assigned and yet RIT and RIT over threshold metrics are 0. We encountered this state in our production and it got us thinking deeper about RIT tracking.
After HBASE-28158 (Decouple RIT list management from TRSP invocation) a region will be considered in transition whenever its current state is not at the desired terminal state for the table's 'enabled' status. If a table is enabled, and a region of this table is not in OPEN state, it will be by this new definition in transition (and perhaps stuck); and conversely if a table is disabled, and a region of the table is not in CLOSED state, the region is in transition (and perhaps stuck). We are going to adopt this change in our 2.5 based production but I want to run this by the community before merging the change back all the way to 2.5 in open source, thus including 2.6 as well. The RIT metric and RIT over threshold metrics will be calculated differently (IMHO, now correctly) and so this may affect your production metrics and monitoring. I can stop at branch-2 for now or bring it all the way back. Are there any concerns? -- Best regards, Andrew
