[jira] [Updated] (HBASE-25815) RSGroupBasedLoadBalancer online status never updates after being set to true for the first time

Caroline Zhou (Jira) Mon, 16 Aug 2021 00:49:08 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Caroline Zhou updated HBASE-25815:
----------------------------------
    Description: 
Once the RSGroupBasedLoadBalancer is “online” (it has found the hbase:meta and 
hbase:rsgroup tables), it will never update the status again. That means if 
hbase:meta or hbase:rsgroup ever go offline, the balancer doesn’t update its 
status to “offline,” so some of the code paths will go through the “online” 
code path even though the catalog tables aren’t available to be read from or 
written to (in particular, anything that calls 
RSGroupInfoManagerImpl#flushConfig).

Also, in the RSGroupInfoManagerImpl#flushConfig code path, the call to write to 
hbase:rsgroup comes before the update to the rsGroupMap and tableMap which are 
stored in memory (see order of [these lines of 
code|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L664-L670]),
 so if hbase:rsgroup goes offline after the RSGroupBasedLoadBalancer is already 
marked as “online,” exceptions thrown while trying to write to an offline 
hbase:rsgroup table prevent the in-memory rsGroupMap and tableMap from being 
updated. In terms of the order just mentioned, in-memory state should be 
updated first.

  was:
Once the RSGroupBasedLoadBalancer is “online” (it has found the hbase:meta and 
hbase:rsgroup tables), it will never update the status again. That means if 
hbase:meta or hbase:rsgroup ever go offline, the balancer doesn’t update its 
status to “offline,” so some of the code paths will go through the “online” 
code path even though the catalog tables aren’t available to be read from or 
written to (in particular, anything that calls 
RSGroupInfoManagerImpl#flushConfig).

Also, in the RSGroupInfoManagerImpl#flushConfig code path, the call to write to 
hbase:rsgroup comes before the update to the rsGroupMap and tableMap which are 
stored in memory (see order of [these lines of 
code|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L664-L670]),
 so if hbase:rsgroup goes offline after the RSGroupBasedLoadBalancer is already 
marked as “online,” exceptions thrown while trying to write to an offline 
hbase:rsgroup table prevent the in-memory rsGroupMap and tableMap from being 
updated. In terms of the order just mentioned, in-memory state should be 
updated first.

Seems to be addressed by HBASE-22662


> RSGroupBasedLoadBalancer online status never updates after being set to true 
> for the first time
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25815
>                 URL: https://issues.apache.org/jira/browse/HBASE-25815
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Caroline Zhou
>            Assignee: Caroline Zhou
>            Priority: Minor
>
> Once the RSGroupBasedLoadBalancer is “online” (it has found the hbase:meta 
> and hbase:rsgroup tables), it will never update the status again. That means 
> if hbase:meta or hbase:rsgroup ever go offline, the balancer doesn’t update 
> its status to “offline,” so some of the code paths will go through the 
> “online” code path even though the catalog tables aren’t available to be read 
> from or written to (in particular, anything that calls 
> RSGroupInfoManagerImpl#flushConfig).
> Also, in the RSGroupInfoManagerImpl#flushConfig code path, the call to write 
> to hbase:rsgroup comes before the update to the rsGroupMap and tableMap which 
> are stored in memory (see order of [these lines of 
> code|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L664-L670]),
>  so if hbase:rsgroup goes offline after the RSGroupBasedLoadBalancer is 
> already marked as “online,” exceptions thrown while trying to write to an 
> offline hbase:rsgroup table prevent the in-memory rsGroupMap and tableMap 
> from being updated. In terms of the order just mentioned, in-memory state 
> should be updated first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-25815) RSGroupBasedLoadBalancer online status never updates after being set to true for the first time

Reply via email to