Huanli-Meng commented on code in PR #644:
URL: https://github.com/apache/pulsar-site/pull/644#discussion_r1266690296
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
+
+Before getting started with load balancing, it's important to review the key
components to ensure that resources are utilized efficiently and varying
workloads can be handled by the system effectively.
+
+## Brokers
+
+In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are
responsible for serving messages for different topics and partitions. Broker
load balancing ensures that each broker handles a proportional share of the
load.
+
+## Producers
+
+[Producers](./reference-terminology.md#producer) in Pulsar are responsible for
publishing messages to topics. Pulsar clients (producers) connect to brokers to
publish messages. Producer load balancing (i.e., connection pooling mechanism
in Pulsar) ensures that producers are distributed across brokers to avoid
overwhelming a single broker with too many connections.
+
+## Consumers
+
+[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for
consuming messages from topics. Depending on how consumer load balancing is
configured (i.e., using exclusive or shared consumers or auto-rebalancing), you
can ensure even load distribution.
+
+## Topics
+
+[Topics](./reference-terminology.md#topic) are the basic units for clients to
publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar uses a strategy of grouping topics by partitioning
on a namespace to create topic bundles.
Review Comment:
```suggestion
[Topics](./reference-terminology.md#topic) are the basic units for clients
to publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar groups topics by partitioning on a namespace to
create topic bundles.
```
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
Review Comment:
```suggestion
Pulsar provides robust load balancing to ensure efficient utilization of
resources across Pulsar clusters. Load balancing in Pulsar involves
distributing messages and partitions evenly among brokers and consumers to
prevent hotspots and optimize performance.
```
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
+
+Before getting started with load balancing, it's important to review the key
components to ensure that resources are utilized efficiently and varying
workloads can be handled by the system effectively.
+
+## Brokers
+
+In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are
responsible for serving messages for different topics and partitions. Broker
load balancing ensures that each broker handles a proportional share of the
load.
+
+## Producers
+
+[Producers](./reference-terminology.md#producer) in Pulsar are responsible for
publishing messages to topics. Pulsar clients (producers) connect to brokers to
publish messages. Producer load balancing (i.e., connection pooling mechanism
in Pulsar) ensures that producers are distributed across brokers to avoid
overwhelming a single broker with too many connections.
+
+## Consumers
+
+[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for
consuming messages from topics. Depending on how consumer load balancing is
configured (i.e., using exclusive or shared consumers or auto-rebalancing), you
can ensure even load distribution.
+
+## Topics
+
+[Topics](./reference-terminology.md#topic) are the basic units for clients to
publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar uses a strategy of grouping topics by partitioning
on a namespace to create topic bundles.
+
+
+
+## Bundles
+
+[Bundles](./reference-terminology.md#namespace-bundle) represent a range of
partitions for a particular namespace in Pulsar, comprising a portion of the
overall hash range of the namespace.
+
+Bundle is introduced in Pulsar to represent a middle-layer group. Each bundle
is an **assignment unit**, which means topics are assigned to brokers at the
**bundle** level rather than the topic level.
+
+## Broker load balancing
+
+The broker load balancer component is like a "traffic cop" sitting between
clients and brokers. It balances topic sessions across brokers based on dynamic
load data, such as broker resource usage (e.g., CPU, memory, network IO) and
topic/bundle loads (e.g., throughput).
+
+When properly balanced, the brokers can handle increased traffic and ensure
that the system can scale seamlessly to accommodate growing workloads. Load
balancing helps prevent bottlenecks and ensures that the resources of the
cluster are utilized optimally, leading to better throughput and reduced
message processing latency.
+
+
+
+## Topic bundling
+
+Topic bundling refers to the process of grouping topics into bundles. Pulsar
organizes topics into bundles within a namespace. Each bundle is a range of
partitions, and Pulsar can automatically distribute these bundles across
brokers to achieve load balancing. This allows the cluster to scale more
efficiently as brokers can independently manage their assigned bundles.
+
+For example,
+
+- Topic load statistics (e.g., message rates) are aggregated at the **bundle**
layer, which reduces the cardinality of load samples to monitor.
+
+- For dynamic topic-broker assignments, Pulsar persists these mappings at the
**bundle **level, which decreases the space for storing dynamic topic-broker
ownerships.
Review Comment:
at **bundle **level or at the **bundle** layer? Should keep consistent?
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
+
+Before getting started with load balancing, it's important to review the key
components to ensure that resources are utilized efficiently and varying
workloads can be handled by the system effectively.
+
+## Brokers
+
+In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are
responsible for serving messages for different topics and partitions. Broker
load balancing ensures that each broker handles a proportional share of the
load.
+
+## Producers
+
+[Producers](./reference-terminology.md#producer) in Pulsar are responsible for
publishing messages to topics. Pulsar clients (producers) connect to brokers to
publish messages. Producer load balancing (i.e., connection pooling mechanism
in Pulsar) ensures that producers are distributed across brokers to avoid
overwhelming a single broker with too many connections.
+
+## Consumers
+
+[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for
consuming messages from topics. Depending on how consumer load balancing is
configured (i.e., using exclusive or shared consumers or auto-rebalancing), you
can ensure even load distribution.
+
+## Topics
+
+[Topics](./reference-terminology.md#topic) are the basic units for clients to
publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar uses a strategy of grouping topics by partitioning
on a namespace to create topic bundles.
+
+
+
+## Bundles
+
+[Bundles](./reference-terminology.md#namespace-bundle) represent a range of
partitions for a particular namespace in Pulsar, comprising a portion of the
overall hash range of the namespace.
+
+Bundle is introduced in Pulsar to represent a middle-layer group. Each bundle
is an **assignment unit**, which means topics are assigned to brokers at the
**bundle** level rather than the topic level.
+
+## Broker load balancing
+
+The broker load balancer component is like a "traffic cop" sitting between
clients and brokers. It balances topic sessions across brokers based on dynamic
load data, such as broker resource usage (e.g., CPU, memory, network IO) and
topic/bundle loads (e.g., throughput).
+
+When properly balanced, the brokers can handle increased traffic and ensure
that the system can scale seamlessly to accommodate growing workloads. Load
balancing helps prevent bottlenecks and ensures that the resources of the
cluster are utilized optimally, leading to better throughput and reduced
message processing latency.
+
+
+
+## Topic bundling
+
+Topic bundling refers to the process of grouping topics into bundles. Pulsar
organizes topics into bundles within a namespace. Each bundle is a range of
partitions, and Pulsar can automatically distribute these bundles across
brokers to achieve load balancing. This allows the cluster to scale more
efficiently as brokers can independently manage their assigned bundles.
+
+For example,
+
+- Topic load statistics (e.g., message rates) are aggregated at the **bundle**
layer, which reduces the cardinality of load samples to monitor.
+
+- For dynamic topic-broker assignments, Pulsar persists these mappings at the
**bundle **level, which decreases the space for storing dynamic topic-broker
ownerships.
+
+Pulsar allows you to dynamically scale the number of brokers, producers, and
consumers to adapt to changing workloads. As brokers are added or removed,
Pulsar handles the redistribution of partitions and bundles automatically.
+
+### Workflow
+
+Below is the workflow for grouping topics into bundles.
+
+#### Step 1: shard namespaces into bundles
+
+Internally, when a namespace is created, the namespace is sharded into a list
of bundles.
+
+#### Step 2: assign topics to bundles
+
+When a topic is created or looked up for pub/sub sessions, brokers map the
topic to a particular bundle by taking the hash of the topic name (for example,
hash("my-topic") = 0x0000000F) and checking in which bundle the hash falls.
+
+Here "topic" means either a **non-partitioned topic** or **one partition of a
partitioned topic**. For partitioned topics, Pulsar internally considers
partitions as separate topics, hence different partitions can be assigned to
different bundles and brokers.
+
+
+
+## Bundle assignment
+
+Bundle assignment refers to assigning bundles to brokers dynamically based on
changing conditions.
+
+For example, based on broker resource usage (e.g., CPU, memory, network IO)
and bundle loads (e.g., throughput), a bundle is dynamically assigned to a
particular broker. Each bundle is independent of the others and thus is
independently assigned to different brokers. Each broker takes ownership of a
bundle (aka, a subset of the topics for a namespace).
+
+Bundle assignment plays a crucial role in achieving efficient load
distribution and scalability within a Pulsar cluster. The purpose of bundle
assignments is to ensure balanced resource utilization and facilitate dynamic
scaling within the Pulsar architecture.
+
+### Workflow
+
+Below is the workflow for dynamic bundle assignment.
+
+#### Step 1: assign bundles to brokers dynamically
+
+When a client starts using new topics (bundles) that are not assigned to any
broker, a process is triggered to choose the best-suited broker to acquire
ownership of these bundles according to the load conditions.
+
+#### Step 2: reassign bundles to other brokers (optional)
+
+If a broker owning a bundle crashes, the bundle (topic) is reassigned to
another available broker.
+
+
+
+To discover the current bundle-broker ownership for a given topic, Pulsar uses
a server-side discovery mechanism that redirects clients to the owner brokers'
URLs. This discovery logic requires:
+
+- Bundle key ranges for a given namespace, to map a topic to a bundle.
+
+- Bundle-broker ownership mapping, to direct the client to the current owner
or to trigger a new ownership acquisition in case there is no broker assigned.
+
+- All bundle ranges and broker-bundle ownership mappings are stored in a
metadata space, and brokers look up them when clients try to discover owner
brokers. For performance reasons, these data are cached at the broker in-memory
layer too.
+
+## Bundle splitting
+
+Bundle splitting refers to the process of identifying and splitting overloaded
bundles, which helps reduce hot spots, achieve more granular load balancing,
improve resource utilization, and enable finer-grained horizontal scaling
within the Pulsar cluster.
+
+The bundle splitting process involves breaking down the original bundle into
smaller bundles, each containing a subset of the original partitions. This
allows for better distribution of the message and processing load across
brokers in the cluster.
+
+You can split bundles in the following ways:
+
+- Automatic: enable Pulsar's automatic bundle splitting process when a
namespace has a significant increase in workload or the number of partitions
exceeds the optimal capacity for a single bundle.
+
+- Manual: trigger bundle splitting manually, to divide an existing bundle into
multiple smaller bundles.
+
+Bundle splitting methods|Definition|When to use
+|---|---|---
+Automatic|Bundles are split automatically based on different [bundle splitting
algorithms](#bundle-splitting-algorithms). | Automatic bundle splitting is most
commonly used.<br/><br/>You can use this method in various scenarios, such as
when a bundle remains hot for a long time.
+Manual|Bundles are split manually based on specified positions.|Manual bundle
splitting serves as a supplementary approach to automatic bundle
splitting.<br/><br/>You can use this method in various scenarios, such as:
<br/><br/> - If automatic bundle splitting is enabled, but there are still
bundles that remain hot for a long time. <br/><br/> - If you want to split
bundles and redistribute traffic evenly before having any broker overloaded.
+
+### Workflow
+
+Below is the workflow for splitting bundles automaticaly or manually.
+
+````mdx-code-block
+<Tabs groupId="bundle-splitting-workflow"
+ defaultValue="Automatic bundle splitting"
+ values={[{"label":"Automatic bundle splitting","value":"Automatic bundle
splitting"},{"label":"Manual bundle splitting","value":"Manual bundle
splitting"}]}>
+<TabItem value="Automatic bundle splitting">
+
+#### Step 1: find target bundles
+
+If the auto bundle split is enabled,
+
+- For the modular load balancer, the leader broker will check if any bundle's
load is beyond the threshold.
+
+- For the extensible load balancer, the load manager will check the bundle's
load in each owner broker.
+
+Bundle splitting threshold can be set based on various conditions. Any
existing bundle that exceeds any of the thresholds is a candidate to be split.
The load balancer assigns the newly split bundles to other brokers, to
facilitate the traffic distribution.
+
+For how to enable bundle split and set bundle split thresholds automatically,
see TBD (the docs is WIP, stay tuned!).
+
+#### Step 2: compute bundle splitting boundaries
+
+Now the target bundles which need to be split are found. Before splitting, the
owner broker needs to compute the splitting positions based on [bundle
splitting algorithms](#bundle-splitting-algorithms).
+
+#### Step 3: split bundles by boundaries
+
+Now the owner broker starts splitting the target bundles and then repartition
them.
+
+After the split, the owner broker updates the bundle ownerships and ranges in
the metadata space. The newly split bundles can be automatically unloaded from
the owner broker.
+
+For example, if the bundle partition is [0x0000, 0x8000, 0xFFFF], and the
splitting boundary is [0x4000] on the target bundle range, [0x0000, 0x8000).
+
+Then the bundle partitions after split is [0x0000, 0x4000, 0x8000, 0xFFFF].
+
+Then the bundle ranges after split is [0x0000, 0x4000), [0x4000, 0x8000), and
[0x8000, 0xFFFF].
+
+</TabItem>
+<TabItem value="Manual bundle splitting">
+
+#### Step 1: find target bundles
+
+Based on the broker resource usage (for example, the number of topics or
sessions, message rates, or bandwidth), you can choose a hot bundle to split.
+
+#### Step 2: compute bundle splitting position boundaries
+
+- If you want to use the specified_positions_divide algorithm, you need to
specify a splitting boundary.
+
+- If you want to use other [bundle splitting
algorithms](#bundle-splitting-algorithms) except for the
specified_positions_divide algorithm, those algorithms will calculate the
position automatically.
+
+Step 3: split bundles at the specific boundaries from step 2.
+
+For how to split bundles manually, see TBD (the docs is WIP, stay tuned!).
+
+</TabItem>
+
+</Tabs>
+````
+
+### Bundle splitting algorithms
+
+Bundle splitting positions can be calculated using different bundle splitting
algorithms.
+
+Below is a brief summary of bundle splitting algorithms.
+
+Bundle splitting algorithm | Definition | When to use|Available in automatic
or manual method? |Available version
+|---|---|---|---|---
+range_equally_divide|Split a bundle into two parts with the same hash range
size.|This is the **default** bundle splitting algorithm. <br/><br/> Use when
there are a large number of topics.| - Automatic <br/> - Manual|Pulsar 1.7 and
later versions
+topic_count_equally_divide| Split a bundle into two parts with the same number
of topics.|Use when there are a small number of topics.|- Automatic <br/> -
Manual | Pulsar 2.6 and later versions
+specified_positions_divide|Split a bundle into several parts by the specified
positions.|Use when the automatic bundle splitting is turned off, or a bundle
is not split even if the automatic bundle splitting is turned on. <br/><br/>
**Note**: Be cautious when using this algorithm. For example, if bundles are
split into **too many small parts**, then these bundles could not be hit by the
hash key. Currently, **bundle compaction is not supported**.|- Manual | Pulsar
2.11 and later versions
+flow_or_qps_equally_divide | Split a bundle into several parts based on
message rate and throughput.| Use when splitting bundles proportional to
traffic.|- Automatic <br/> - Manual | Pulsar 3.0 and later versions
+
+#### range_equally_divide
+
+range_equally_divide splits a bundle into two parts with the same hash range
size.
+
+For example, if the target bundle to split is (0x00000000, 0x80000000), then
the bundle split boundary is [0x40000000].
+
+
+
+#### topic_count_equally_divide
+
+topic_count_equally_divide splits a bundle into two parts with the same number
of topics.
+
+For example, if there are 6 topics in the target bundle [0x00000000,
0x80000000), then you can set the bundle splitting boundary at 0x50000000 to
make the left and right sides of the number of topics the same.
+
+```
+hash(topic1) = 0x10000000
+hash(topic2) = 0x20000000
+hash(topic3) = 0x35000000
+hash(topic4) = 0x65000000
+hash(topic5) = 0x70000000
+hash(topic6) = 0x75000000
+```
+
+That is, the target bundle to split is [0x00000000, 0x80000000), and the
bundle split boundary is [0x50000000].
+
+
+
+For implementation details, see [PR-6241: support evenly distribute topics
count when splitting bundles](https://github.com/apache/pulsar/pull/6241).
+
+#### specified_positions_divide
+
+specified_positions_divide splits bundles into several parts by specified
positions.
+
+For example, if you have 2 large topics and there are on the same bundle.
Topic1 is on at 0x30000000, Topic2 is on at 0x35000000, and the bundle range is
[0x00000000, 0x40000000), then you can set the bundle split boundary as
0x33000000.
Review Comment:
```suggestion
For example, if you have 2 large topics and there are on the same bundle.
Topic1 is at 0x30000000, Topic2 is at 0x35000000, and the bundle range is
[0x00000000, 0x40000000), then you can set the bundle split boundary as
0x33000000.
```
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
+
+Before getting started with load balancing, it's important to review the key
components to ensure that resources are utilized efficiently and varying
workloads can be handled by the system effectively.
+
+## Brokers
+
+In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are
responsible for serving messages for different topics and partitions. Broker
load balancing ensures that each broker handles a proportional share of the
load.
+
+## Producers
+
+[Producers](./reference-terminology.md#producer) in Pulsar are responsible for
publishing messages to topics. Pulsar clients (producers) connect to brokers to
publish messages. Producer load balancing (i.e., connection pooling mechanism
in Pulsar) ensures that producers are distributed across brokers to avoid
overwhelming a single broker with too many connections.
+
+## Consumers
+
+[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for
consuming messages from topics. Depending on how consumer load balancing is
configured (i.e., using exclusive or shared consumers or auto-rebalancing), you
can ensure even load distribution.
+
+## Topics
+
+[Topics](./reference-terminology.md#topic) are the basic units for clients to
publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar uses a strategy of grouping topics by partitioning
on a namespace to create topic bundles.
+
+
+
+## Bundles
+
+[Bundles](./reference-terminology.md#namespace-bundle) represent a range of
partitions for a particular namespace in Pulsar, comprising a portion of the
overall hash range of the namespace.
+
+Bundle is introduced in Pulsar to represent a middle-layer group. Each bundle
is an **assignment unit**, which means topics are assigned to brokers at the
**bundle** level rather than the topic level.
+
+## Broker load balancing
+
Review Comment:
Broker load balancer? As you say "it's important to review the key
components to ensure..." at the beginning. It seems the broker load balancer is
more like a component than broker load balancing.
##########
docs/concepts-broker-load-balancing-concepts.md:
##########
@@ -4,4 +4,554 @@ title: Concepts
sidebar_label: "Concepts"
---
-WIP. Stay tuned!
\ No newline at end of file
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
+Pulsar provides robust support for load balancing to ensure efficient
utilization of resources across Pulsar clusters. Load balancing in Pulsar
involves distributing messages and partitions evenly among brokers and
consumers to prevent hotspots and optimize performance.
+
+Before getting started with load balancing, it's important to review the key
components to ensure that resources are utilized efficiently and varying
workloads can be handled by the system effectively.
+
+## Brokers
+
+In a Pulsar cluster, [brokers](./reference-terminology.md#broker) are
responsible for serving messages for different topics and partitions. Broker
load balancing ensures that each broker handles a proportional share of the
load.
+
+## Producers
+
+[Producers](./reference-terminology.md#producer) in Pulsar are responsible for
publishing messages to topics. Pulsar clients (producers) connect to brokers to
publish messages. Producer load balancing (i.e., connection pooling mechanism
in Pulsar) ensures that producers are distributed across brokers to avoid
overwhelming a single broker with too many connections.
+
+## Consumers
+
+[Consumers](./reference-terminology.md#consumer) in Pulsar are responsible for
consuming messages from topics. Depending on how consumer load balancing is
configured (i.e., using exclusive or shared consumers or auto-rebalancing), you
can ensure even load distribution.
+
+## Topics
+
+[Topics](./reference-terminology.md#topic) are the basic units for clients to
publish and consume messages. Related topics are logically grouped into a
namespace. To efficiently manage metadata and keep track of all of them moving
through the system, Pulsar uses a strategy of grouping topics by partitioning
on a namespace to create topic bundles.
+
+
+
+## Bundles
+
+[Bundles](./reference-terminology.md#namespace-bundle) represent a range of
partitions for a particular namespace in Pulsar, comprising a portion of the
overall hash range of the namespace.
+
+Bundle is introduced in Pulsar to represent a middle-layer group. Each bundle
is an **assignment unit**, which means topics are assigned to brokers at the
**bundle** level rather than the topic level.
Review Comment:
```suggestion
Bundles in Pulsar represent middle-layer groups. Each bundle is an
**assignment unit**, which means topics are assigned to brokers at the
**bundle** level rather than the topic level.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]