Happy Friday,

Nick and I have dug in a bit and have a solution that we're happy with here 
https://github.com/apache/hbase/pull/6651

I'll leave this open for further review this weekend — if you have experience 
or opinions here, please feel free to voice them.

Thanks!
Ray Mattingly

On 2025/02/10 23:46:14 Ray Mattingly wrote:
> Hey all,
> 
> Right now the balancer only supports decision-making on continuous scales;
> these scales are represented as cost functions. Often there is no
> continuity to the nature of the decisions the balancer must make, though —
> for example, "should this region replica be colocated with these others?"
> is a yes or no question.
> 
> These cost functions also are not evaluated in a vacuum — raising the
> multiplier of one implicitly makes all others less important. When making
> discrete decisions, it does not make sense to have this implicit
> deprioritization, particularly if your cluster is configured appropriately
> (i.e., it has more racks than replicas, etc). Further, comparing something
> like "replica host colocation cost" to "read request cost" is a real
> apples:oranges situation... it forces you to find some magic incantation of
> balancer costs that will ultimately keep the balance triggering when
> necessary, but not superfluously. I think this is a tough situation to put
> operators in; better would be if we could just solve for something like
> replica distribution, and *then* balance based only on the more fuzzy
> things like read cost.
> 
> I also believe that we could achieve better reliability via new features if
> the balancer had such a mechanism for making clear, binary decisions ahead
> of cost-based balancing. It would make it possible, for example, to isolate
> tables from one another without the complexity overhead and resource waste
> of managing RSGroups. This may technically be possible today via cost
> functions, but you would need a rocket scientist to manage your balancer,
> and most don't have one on staff.
> 
> I've written up a larger proposal here:
> https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?usp=sharing
> and have a corresponding pull request (
> https://github.com/apache/hbase/pull/6651) which introduces a framework for
> what I've called "balancer conditionals", as well as the first conditional
> implementation which will distribute replicas throughout a cluster
> (considering all replicas, including secondary+secondary colocation, which
> is not supported in the current cost function based approach).
> 
> Thank you for reading, I am curious to know others' thoughts on this
> subject!
> Ray Mattingly
> 

Reply via email to