Thanks for raising the question Ray. I think that many people will be familiar with the issues you illustrate so well in the design doc -- the experience of a cluster suffering from a few regions that simply won't conform. This approach rings true for me with my experience with pod scheduling on Kubernetes. Overall I think that this is a step in the right direction, while retaining the fundamentals of the existing Stochastic implementation. More comments in the doc/PR.
Thanks, Nick On Tue, Feb 11, 2025 at 12:46 AM Ray Mattingly <rmattin...@apache.org> wrote: > > Hey all, > > Right now the balancer only supports decision-making on continuous scales; > these scales are represented as cost functions. Often there is no > continuity to the nature of the decisions the balancer must make, though — > for example, "should this region replica be colocated with these others?" > is a yes or no question. > > These cost functions also are not evaluated in a vacuum — raising the > multiplier of one implicitly makes all others less important. When making > discrete decisions, it does not make sense to have this implicit > deprioritization, particularly if your cluster is configured appropriately > (i.e., it has more racks than replicas, etc). Further, comparing something > like "replica host colocation cost" to "read request cost" is a real > apples:oranges situation... it forces you to find some magic incantation of > balancer costs that will ultimately keep the balance triggering when > necessary, but not superfluously. I think this is a tough situation to put > operators in; better would be if we could just solve for something like > replica distribution, and *then* balance based only on the more fuzzy > things like read cost. > > I also believe that we could achieve better reliability via new features if > the balancer had such a mechanism for making clear, binary decisions ahead > of cost-based balancing. It would make it possible, for example, to isolate > tables from one another without the complexity overhead and resource waste > of managing RSGroups. This may technically be possible today via cost > functions, but you would need a rocket scientist to manage your balancer, > and most don't have one on staff. > > I've written up a larger proposal here: > https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?usp=sharing > and have a corresponding pull request ( > https://github.com/apache/hbase/pull/6651) which introduces a framework for > what I've called "balancer conditionals", as well as the first conditional > implementation which will distribute replicas throughout a cluster > (considering all replicas, including secondary+secondary colocation, which > is not supported in the current cost function based approach). > > Thank you for reading, I am curious to know others' thoughts on this > subject! > Ray Mattingly