Anonymitaet commented on code in PR #644: URL: https://github.com/apache/pulsar-site/pull/644#discussion_r1267452731
########## docs/concepts-broker-load-balancing-concepts.md: ########## @@ -4,4 +4,554 @@ title: Concepts sidebar_label: "Concepts" --- -WIP. Stay tuned! \ No newline at end of file +````mdx-code-block +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +```` + +Pulsar provides robust support for load balancing to ensure efficient utilization of resources across Pulsar clusters. Load balancing in Pulsar involves distributing messages and partitions evenly among brokers and consumers to prevent hotspots and optimize performance. + +Before getting started with load balancing, it's important to review the key components to ensure that resources are utilized efficiently and varying workloads can be handled by the system effectively. + +## Brokers + +In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are responsible for serving messages for different topics and partitions. Broker load balancing ensures that each broker handles a proportional share of the load. + +## Producers + +[Producers](./reference-terminology.md#producer) in Pulsar are responsible for publishing messages to topics. Pulsar clients (producers) connect to brokers to publish messages. Producer load balancing (i.e., connection pooling mechanism in Pulsar) ensures that producers are distributed across brokers to avoid overwhelming a single broker with too many connections. + +## Consumers + +[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for consuming messages from topics. Depending on how consumer load balancing is configured (i.e., using exclusive or shared consumers or auto-rebalancing), you can ensure even load distribution. + +## Topics + +[Topics](./reference-terminology.md#topic) are the basic units for clients to publish and consume messages. Related topics are logically grouped into a namespace. To efficiently manage metadata and keep track of all of them moving through the system, Pulsar uses a strategy of grouping topics by partitioning on a namespace to create topic bundles. + + + +## Bundles + +[Bundles](./reference-terminology.md#namespace-bundle) represent a range of partitions for a particular namespace in Pulsar, comprising a portion of the overall hash range of the namespace. + +Bundle is introduced in Pulsar to represent a middle-layer group. Each bundle is an **assignment unit**, which means topics are assigned to brokers at the **bundle** level rather than the topic level. + +## Broker load balancing + +The broker load balancer component is like a "traffic cop" sitting between clients and brokers. It balances topic sessions across brokers based on dynamic load data, such as broker resource usage (e.g., CPU, memory, network IO) and topic/bundle loads (e.g., throughput). + +When properly balanced, the brokers can handle increased traffic and ensure that the system can scale seamlessly to accommodate growing workloads. Load balancing helps prevent bottlenecks and ensures that the resources of the cluster are utilized optimally, leading to better throughput and reduced message processing latency. + + + +## Topic bundling + +Topic bundling refers to the process of grouping topics into bundles. Pulsar organizes topics into bundles within a namespace. Each bundle is a range of partitions, and Pulsar can automatically distribute these bundles across brokers to achieve load balancing. This allows the cluster to scale more efficiently as brokers can independently manage their assigned bundles. + +For example, + +- Topic load statistics (e.g., message rates) are aggregated at the **bundle** layer, which reduces the cardinality of load samples to monitor. + +- For dynamic topic-broker assignments, Pulsar persists these mappings at the **bundle **level, which decreases the space for storing dynamic topic-broker ownerships. Review Comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
