Thanks for raising the question Ray. I think that many people will be
familiar with the issues you illustrate so well in the design doc --
the experience of a cluster suffering from a few regions that simply
won't conform. This approach rings true for me with my experience with
pod scheduling on Kubernetes. Overall I think that this is a step in
the right direction, while retaining the fundamentals of the existing
Stochastic implementation. More comments in the doc/PR.

Thanks,
Nick


On Tue, Feb 11, 2025 at 12:46 AM Ray Mattingly <rmattin...@apache.org> wrote:
>
> Hey all,
>
> Right now the balancer only supports decision-making on continuous scales;
> these scales are represented as cost functions. Often there is no
> continuity to the nature of the decisions the balancer must make, though —
> for example, "should this region replica be colocated with these others?"
> is a yes or no question.
>
> These cost functions also are not evaluated in a vacuum — raising the
> multiplier of one implicitly makes all others less important. When making
> discrete decisions, it does not make sense to have this implicit
> deprioritization, particularly if your cluster is configured appropriately
> (i.e., it has more racks than replicas, etc). Further, comparing something
> like "replica host colocation cost" to "read request cost" is a real
> apples:oranges situation... it forces you to find some magic incantation of
> balancer costs that will ultimately keep the balance triggering when
> necessary, but not superfluously. I think this is a tough situation to put
> operators in; better would be if we could just solve for something like
> replica distribution, and *then* balance based only on the more fuzzy
> things like read cost.
>
> I also believe that we could achieve better reliability via new features if
> the balancer had such a mechanism for making clear, binary decisions ahead
> of cost-based balancing. It would make it possible, for example, to isolate
> tables from one another without the complexity overhead and resource waste
> of managing RSGroups. This may technically be possible today via cost
> functions, but you would need a rocket scientist to manage your balancer,
> and most don't have one on staff.
>
> I've written up a larger proposal here:
> https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?usp=sharing
> and have a corresponding pull request (
> https://github.com/apache/hbase/pull/6651) which introduces a framework for
> what I've called "balancer conditionals", as well as the first conditional
> implementation which will distribute replicas throughout a cluster
> (considering all replicas, including secondary+secondary colocation, which
> is not supported in the current cost function based approach).
>
> Thank you for reading, I am curious to know others' thoughts on this
> subject!
> Ray Mattingly

Reply via email to