thetumbled opened a new pull request, #20972:
URL: https://github.com/apache/pulsar/pull/20972

   
   ### Motivation
   Load balance module in pulsar broker pose greate pressure on zk, because 
broker will write and read lots of znode corresponding to every bundle. As 
there are hundreds of thousands of bundles in clusters, broker will write and 
read hundreds of thousands of znode, which pose greate pressure on zk.
   
   As All Load Shedding Algorithm pick bundles from top to bottom based on 
throughput/msgRate, bundles with low throughput/msgRate can never be selected 
for shedding. So there is no need to update these bundleData to zk frequently.
   
   
   ### Modifications
   add configuration:
   ```
       @FieldContext(
               dynamic = true,
               category = CATEGORY_LOAD_BALANCER,
               doc = "minimum throughput in of bundle to be considered for 
updating data in metadata store"
       )
       private int loadBalancerBundleThroughputThresholdInByte = 0;
   
       @FieldContext(
               dynamic = true,
               category = CATEGORY_LOAD_BALANCER,
               doc = "minimum message rate in of bundle to be considered for 
updating data in metadata store"
       )
       private int loadBalancerBundleMsgThreshold = 0;
   ```
   
   Let's see the effect:
   First, analyze the throughput distributions of bundles in cluster.
   
![image](https://github.com/apache/pulsar/assets/52550727/e1a400a5-c027-48fb-91f9-6d20864138c1)
   There are about 1k bundles in the cluster, but there are only 24 bundles 
whose throughput in is higher than 1MB. There are more than 80% bundles whose 
throughput in is lower than 0.1MB, which is barely useful in shedding algorithm.
   So we can configure broker.conf:
   ```
   loadBalancerBundleThroughputThresholdInByte=100*1024  # 100Kb
   ```
   
   Results as follows:
   <img width="945" alt="image" 
src="https://github.com/apache/pulsar/assets/52550727/50825aee-f919-417c-9735-cf393738dea1";>
   Total write throughput to /loadbalance namespace of zk decrease from 8kb/s 
to 3kb/s.
   
   <img width="621" alt="image" 
src="https://github.com/apache/pulsar/assets/52550727/48662804-847d-47b0-a84a-f9ebb8818227";>
   Update Latency decrease from 200ms to 4ms!
   
   <img width="950" alt="image" 
src="https://github.com/apache/pulsar/assets/52550727/bc08ca28-09ba-40dd-89c3-3e0f3b5c2227";>
   <img width="638" alt="image" 
src="https://github.com/apache/pulsar/assets/52550727/40a22f3b-a44b-4eac-9135-b479d52bca20";>
   
   As the write throughput decrease, the read throughput and latency will also 
decrease significantly.
   
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads 
(10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to