Happy Friday, Nick and I have dug in a bit and have a solution that we're happy with here https://github.com/apache/hbase/pull/6651
I'll leave this open for further review this weekend — if you have experience or opinions here, please feel free to voice them. Thanks! Ray Mattingly On 2025/02/10 23:46:14 Ray Mattingly wrote: > Hey all, > > Right now the balancer only supports decision-making on continuous scales; > these scales are represented as cost functions. Often there is no > continuity to the nature of the decisions the balancer must make, though — > for example, "should this region replica be colocated with these others?" > is a yes or no question. > > These cost functions also are not evaluated in a vacuum — raising the > multiplier of one implicitly makes all others less important. When making > discrete decisions, it does not make sense to have this implicit > deprioritization, particularly if your cluster is configured appropriately > (i.e., it has more racks than replicas, etc). Further, comparing something > like "replica host colocation cost" to "read request cost" is a real > apples:oranges situation... it forces you to find some magic incantation of > balancer costs that will ultimately keep the balance triggering when > necessary, but not superfluously. I think this is a tough situation to put > operators in; better would be if we could just solve for something like > replica distribution, and *then* balance based only on the more fuzzy > things like read cost. > > I also believe that we could achieve better reliability via new features if > the balancer had such a mechanism for making clear, binary decisions ahead > of cost-based balancing. It would make it possible, for example, to isolate > tables from one another without the complexity overhead and resource waste > of managing RSGroups. This may technically be possible today via cost > functions, but you would need a rocket scientist to manage your balancer, > and most don't have one on staff. > > I've written up a larger proposal here: > https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?usp=sharing > and have a corresponding pull request ( > https://github.com/apache/hbase/pull/6651) which introduces a framework for > what I've called "balancer conditionals", as well as the first conditional > implementation which will distribute replicas throughout a cluster > (considering all replicas, including secondary+secondary colocation, which > is not supported in the current cost function based approach). > > Thank you for reading, I am curious to know others' thoughts on this > subject! > Ray Mattingly >