rmdmattingly opened a new pull request, #6593:
URL: https://github.com/apache/hbase/pull/6593

   See my design doc 
[here](https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?usp=sharing)
   
   To sum it up, the current load balancer isn't great for what it's supposed 
to do now, and it won't support all of the things that we'd like it to do in a 
perfect world.
   
   Right now: primary replica balancing squashes all other considerations. The 
default weight for one of the several cost functions that factor into primary 
replica balancing is 100,000. Meanwhile the default read request cost is 5. The 
result is that the load balancer, OOTB, basically doesn't care about balancing 
actual load. To solve this, you can either set primary replica balancing costs 
to zero, which is fine if you don't use read replicas, or — if you do use read 
replicas — maybe you can produce a magic incantation of configurations that 
work _just_ right, until your needs change.
   
   In the future: we'd like a lot more out of the balancer. System table 
isolation, meta table isolation, colocation of regions based on start key 
prefix similarity (this is a very rough idea atm, and not touched in the scope 
of this PR). And to support all of these features with either cost functions or 
RS groups would be a real burden. I think what I'm proposing here will be a 
much, much easier path for HBase operators.
   
   ## New features
   
   This PR introduces some new features:
   1. Balancer conditional based replica distribution
   2. System table isolation (put backups, quotas, etc on their own 
RegionServer (all sys tables on 1))
   3. Meta table isolation (put meta on its own RegionServer)
   
   These can be controlled via:
   
   - hbase.master.balancer.stochastic.conditionals.distributeReplicas: set this 
to true to enable conditional based replica distribution
   - hbase.master.balancer.stochastic.conditionals.isolateSystemTables: set 
this to true to enable system table isolation
   - hbase.master.balancer.stochastic.conditionals.isolateMetaTable: set this 
to true to enable meta table isolation
   - hbase.master.balancer.stochastic.additionalConditionals: much like cost 
functions, you can define your own RegionPlanConditional implementation and 
install them here
   
   ## Testing
   
   I wrote a lot of unit tests to validate the functionality here — both 
lightweight and some minicluster tests. Even in the most extreme cases (like, 
system table isolation + meta table isolation enabled on a 3 node cluster, or 
the number of read replicas == the number of servers) the balancer does what 
we'd expect.
   
   ### Replica Distribution Improvements
   
   #### Perfect primary and secondary replica distribution
   
   Not only does this PR offer an alternative means of distributing replicas, 
but it's actually a massive improvement on the existing approach.
   
   See [the Replica Distribution testing section of my design 
doc](https://docs.google.com/document/d/1jA8Ghs86v7b-53j5DcsdbPnOXxbHjewkIBFi1E4S1pY/edit?tab=t.0).
 Cost functions never successfully balance 3 replicas across 3 servers OOTB — 
but balancer conditionals do so expeditiously.
   
   To summarize the testing, we have `replicated_table`, a table with 3 region 
replicas. The 3 regions of a given replica share a color, and there are also 3 
RegionServers in the cluster. We expect the balancer to evenly distribute one 
replica per server across the 3 RegionServers...
   
   **Cost functions don't work**:
   
![cf1](https://github.com/user-attachments/assets/1dccc536-eaa0-4775-878b-5a50d16d8ddf)
   
![cf2](https://github.com/user-attachments/assets/cc70264f-d10a-473e-b726-4ef85ec4ea4e)
   
   **….omitting the meaningless snapshots between 4 and 27…**
   
   
![cf28](https://github.com/user-attachments/assets/bc20781d-c166-4b07-910a-bec5515bfd5a)
   
   At this point, I just exited the test because it was clear that our existing 
balancer would never achieve true replica distribution.
   
   But **balancer conditionals do work**:
   
![bc1](https://github.com/user-attachments/assets/6d9248e6-64ec-4b0d-b12f-e064901e77f8)
   
![bc2](https://github.com/user-attachments/assets/d07c4803-b249-4d02-be54-ce0439c92f96)
   
![bc3](https://github.com/user-attachments/assets/229d1520-a6ef-4f61-83c9-b32dd2e7671d)
   
![bc4](https://github.com/user-attachments/assets/c0bd874a-8ac0-4882-8ffb-e4b0be59ba20)
   
![bc5](https://github.com/user-attachments/assets/a2f0e094-a3df-415f-9be0-cbb99cdb7494)
   
   #### Replica distribution performance improvements
   
   I've setup a large cluster test for conditional replica balancing, at an 
identical scale to the existing large cluster test for legacy replica 
balancing. It demonstrates a _significant_ improvement in balancer latency when 
dealing with 1k servers, 20k regions, 3 replicas per region, and 100 tables:
   <img width="515" alt="Screenshot 2025-01-04 at 11 48 55 AM" 
src="https://github.com/user-attachments/assets/15049da7-1e24-46af-971d-c22d2e07b8c5";
 />
   
   ### New Features: Table Isolation Working as Designed
   
   See below where we ran a new unit test, 
TestLargerClusterBalancerConditionals, and tracked the locations of regions for 
3 tables across 18 RegionServers:
   1. 180 “product” table regions
   1. 1 meta table region
   1. 1 quotas table region
   
   All regions began on a single RegionServer, and within 4 balancer iterations 
we had a well balanced cluster, and isolation of key system tables. It achieved 
this in about 2min on my local machine, where most of that time was spent 
bootstrapping the mini cluster.
   
   ![output 
(2)](https://github.com/user-attachments/assets/51621524-aa0a-4701-9f6c-33ba76d76b76)
   
   ![output 
(3)](https://github.com/user-attachments/assets/e0302493-5222-4627-8d59-55ad1c2129bf)
   
   ![output 
(5)](https://github.com/user-attachments/assets/22774f87-aa01-4d12-9887-aef567cc8685)
   
   ![output 
(4)](https://github.com/user-attachments/assets/a9050a29-f71e-4c8f-9809-2b88cadebacb)
   
   #### Table isolation performance testing
   
   Likewise, we created large tests for system table isolation, meta table 
isolation, multi table isolation, and multi table isolation + replica 
distribution. These tests reliably find exactly what we're looking for, and do 
so expeditiously on my local machine for 100 servers and 10k+ regions — all 
tests reliably pass within a few minutes.
   
   cc @ndimiduk @charlesconnell @ksravista @aalhour 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to