Caroline created HBASE-25815:
--------------------------------
Summary: RSGroupBasedLoadBalancer online status never updates
after being set to true for the first time
Key: HBASE-25815
URL: https://issues.apache.org/jira/browse/HBASE-25815
Project: HBase
Issue Type: Bug
Reporter: Caroline
Once the RSGroupBasedLoadBalancer is “online” (it has found the hbase:meta and
hbase:rsgroup tables), it will never update the status again. ** That means if
hbase:meta or hbase:rsgroup ever go offline, the balancer doesn’t update its
status to “offline,” so some of the code paths will go through the “online”
code path even though the catalog tables aren’t available to be read from or
written to (in particular, anything that calls
RSGroupInfoManagerImpl#flushConfig).
Also, in the RSGroupInfoManagerImpl#flushConfig code path, the call to write to
hbase:rsgroup comes before the update to the rsGroupMap and tableMap which are
stored in memory (see order of [these lines of
code|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L664-L670]),
so if hbase:rsgroup goes offline after the RSGroupBasedLoadBalancer is already
marked as “online,” exceptions thrown while trying to write to an offline
hbase:rsgroup table prevent the in-memory rsGroupMap and tableMap from being
updated. In terms of the order just mentioned, in-memory state should be
updated first.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)