Zhanxiang (Patrick) Huang created KAFKA-7537:
------------------------------------------------

             Summary: Only include live brokers in the UpdateMetadataRequest 
sent to existing brokers if there is no change in the partition states
                 Key: KAFKA-7537
                 URL: https://issues.apache.org/jira/browse/KAFKA-7537
             Project: Kafka
          Issue Type: Improvement
          Components: controller
            Reporter: Zhanxiang (Patrick) Huang
            Assignee: Zhanxiang (Patrick) Huang


Currently if when brokers join/leave the cluster without any partition states 
changes, controller will send out UpdateMetadataRequests containing the states 
of all partitions to all brokers. But for existing brokers in the cluster, the 
metadata diff between controller and the broker should only be the 
"live_brokers" info. Only the brokers with empty metadata cache need the full 
UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all brokers 
can place nonnegligible memory pressure on the controller side.

Let's say in total we have N brokers, M partitions in the cluster and we want 
to add 1 brand new broker in the cluster. With RF=2, the memory footprint per 
partition in the UpdateMetadataRequest is ~200 Bytes. In the current controller 
implementation, if each of the N RequestSendThreads serializes and sends out 
the UpdateMetadataRequest at roughly the same time (which is very likely the 
case), we will end up using *(N+1)*M*200B*. In a large kafka cluster, we can 
have:
{noformat}
N=99
M=100k

Memory usage to send out UpdateMetadataRequest to all brokers:
100 * 100K * 200B = 2G

However, we only need to send out full UpdateMetadataRequest to the newly added 
broker. We only need to include live broker ids (4B * 100 brokers) in the 
UpdateMetadataRequest sent to the existing 99 brokers. So the amount of data 
that is actully needed will be:
1 * 100K * 200B + 99 * (100 * 4B) = ~21M


We will can potentially reduce 2G / 21M = ~95x memory footprint as well as the 
data tranferred in the network.{noformat}
 

This issue kind of hurts the scalability of a kafka cluster. KIP-380 and 
KAFKA-7186 also help to further reduce the controller memory footprint.

 

In terms of implementation, we can keep some in-memory state in the controller 
side to differentiate existing brokers and uninitialized brokers (e.g. brand 
new brokers) so that if there is no change in partition states, we only send 
out live brokers info to existing brokers.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to