thetumbled commented on code in PR #21011: URL: https://github.com/apache/pulsar/pull/21011#discussion_r1299917269
########## pip/pip-294.md: ########## @@ -0,0 +1,67 @@ +# Background knowledge + +There are mainly two `LoadManager` implementation in Pulsar broker: `ExtensibleLoadManager` and `ModularLoadManagerImpl`. `ModularLoadManagerImpl` is the default load manager, and `ExtensibleLoadManager` is a new load manager which is proposed after 3.0.0 version. + +## ModularLoadManagerImpl +`ModularLoadManagerImpl` rely on zk to store and synchronize metadata about load, which pose greate pressure on zk, threatening the stability of system. Every broker will upload its `LocalBrokerData` to zk, and leader broker will retrieve all `LocalBrokerData` from zk ,generate all `BundleData` from each `LocalBrokerData`, and update all `BundleData` to zk. + +## ExtensibleLoadManager +`ExtensibleLoadManager` depends on system topics and table views for load balance metadata store and replication. Though not using zk to store and synchronize metadata about load, it is still necessary to control the number of bundles that need to be updated, for which there is a `loadBalancerMaxNumberOfBundlesInBundleLoadReport` configuration in `ExtensibleLoadManager` that select the top k bundles. + +# Motivation + +## ModularLoadManagerImpl +As every bundle in the cluster corresponds to a zk node, it is common that there are thousands of zk nodes in a cluster, which results into thousands of read/update operations to zk. This will cause a lot of pressure on zk. + +**As All Load Shedding Algorithm pick bundles from top to bottom based on throughput/msgRate, bundles with low throughput/msgRate are rarely be selected for shedding. So there is no need to update these bundleData to zk frequently.** + +## ExtensibleLoadManager +As the number of bundles and the throughput in the cluster changes dynamically, users can't select a reasonable number of bundles easily, while based on throughput, we can be sure that bundles with throughput less than 0.1M is useless for load balance no matter how the cluster change. + +We can also add this throughput-based bundle report size control in the ExtensibleLoadManager, working together with `loadBalancerMaxNumberOfBundlesInBundleLoadReport`. Review Comment: To configure a good value for `loadBalancerMaxNumberOfBundlesInBundleLoadReport` is not a easy way as the throughput and the number of bundles change dynamically. For example, if the shedding algorithm need to pick bundles with 100M for total, and the least throughput of bundle in the topK bundle set is greater than 100M, there will be no candidate bundle for unloading. We can't determine how much bundles do we need for load balancing, which is a dynamic value. Both the throughput and the number of bundles change dynamically. But, we can be sure that bundles with throughput less than 0.1Kb is useless for loadbalance no matter how the cluster change in most of cases. For clusters with extremely low throughput, we do not need to enable this feature, for which **i suggest to disable this feature by default, or set a pretty low default value.** Anyway, it's not that necessary for clusters with extremely low throughput to do load balance, right? **More essential opinion, it is because we pick bundles based on throughput/msg-rate in shedding algorithm, that we filter bundles based on throughput/msg-rate.** WDYT, @codelipenghui -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
