Jialun Peng created KAFKA-19507:
-----------------------------------
Summary: Optimize Replica Assignment for Broker Load Balance in
Uneven Rack Configurations
Key: KAFKA-19507
URL: https://issues.apache.org/jira/browse/KAFKA-19507
Project: Kafka
Issue Type: Improvement
Reporter: Jialun Peng
h3. Issue Description
Kafka's current replica assignment strategy prioritizes _balancing replica
counts across racks_ (availability zones in cloud environments) over _balancing
replicas across individual brokers_. While this ensures rack diversity, it
creates significant broker-level load imbalance when racks contain unequal
numbers of brokers.
h3. Problem Illustration
Consider a 3-replica topic with 3 racks:
* *Rack A*: Brokers 1, 4
* *Rack B*: Brokers 2, 5
* *Rack C*: Broker 3 (single broker)
Under the current strategy:
* Brokers 1, 2, 4, 5 each receive 1/6 of all replicas
* Broker 3 receives 1/3 of all replicas (twice the load of others)
This forces Broker 3 into a bottleneck ("bucket effect"), as it handles double
the traffic and storage load.
To mitigate this, deployments today must maintain broker counts as _multiples
of rack counts_ (e.g., 3, 6, 9 brokers for 3 racks). While this ensures
balance, it:
# *Restricts deployment flexibility*: Scaling clusters horizontally requires
adding/removing nodes in rack-sized increments.
# *Increases costs unnecessarily*: For example, a 4-broker cluster could
suffice for a 3-rack setup, but users must deploy 6 brokers to maintain
balance—increasing infrastructure costs by 50%.
h3. Proposed Solution
Modify the assignment strategy to:
# *Prioritize broker-level balance* as the primary objective.
# *Weight rack-level distribution* by broker count per rack (e.g., a rack with
2 brokers receives twice the replicas of a rack with 1 broker).
h4. Benefits
* *Balanced load*: All brokers receive near-equal replicas regardless of rack
imbalance.
* *Deployment flexibility*: Clusters can scale to _any size_ as long as
{{rack_count ≥ replica_factor}}.
* *Cost efficiency*: Users deploy only necessary brokers.
h4. Example Scenario
_3 replicas, 4 racks with 5 brokers:_
* *Rack A*: Brokers 1, 5 → Receives 2/5 of replicas (distributed evenly
between Brokers 1 & 5)
* *Racks B, C, D*: 1 broker each → Each receives 1/5 of replicas _Result_:
Every broker handles exactly 1/5 of total replicas—eliminating bottlenecks.
h3. Request
We propose modifying the replica assignment algorithm to prioritize
broker-level replica balance, while using rack-node-count-weighted
distribution. This allows enterprises to deploy Kafka clusters with more
flexible node counts, significantly improving cost efficiency while maintaining
rack awareness.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)