caroliney14 commented on a change in pull request #3606:
URL: https://github.com/apache/hbase/pull/3606#discussion_r697068145
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java
##########
@@ -825,6 +829,20 @@ private void createRSGroupTable() throws IOException {
}
public boolean isOnline() {
Review comment:
The reason is that if we do not periodically update the `online` status
to reflect the availability of the `hbase:rsgroup` table, we could become
blocked waiting on a flush to the `hbase:rsgroup` table when it can't be
accessed (e.g. it's stuck in transition, offline, the rs hosting it has
queueing, etc.). Each rsgroup functionality (add, move servers, move tables,
remove, etc.) is `synchronized`, and furthermore the `multiMutate` function
which does the persisting to `hbase:rsgroup` uses `Future.get` without timeout.
So if `hbase:rsgroup` is unavailable we will keep getting blocked until the
client times out, and we will be unable to serve another rsgroup request in the
meantime, when we could have exited early by checking for the availability of
`hbase:rsgroup`.
Instead of being blocked waiting like this, we can go through an "offline"
code path. There already is an offline code path in `flushConfig` which only
updates the in-memory state of the default group
([here](https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L633-L662)),
but we could also change it so that it updates in-memory state while
asynchronously trying to persist it to `hbase:rsgroup` in the background.
Please correct me if I misunderstood anything. What do you think about this
rationale?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]