Marouane RAJI created KAFKA-8796:
------------------------------------

             Summary: A broker joining the cluster should be able to replicate 
without impacting the cluster
                 Key: KAFKA-8796
                 URL: https://issues.apache.org/jira/browse/KAFKA-8796
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Marouane RAJI
         Attachments: image-2019-08-13-10-26-19-282.png, 
image-2019-08-13-10-28-42-337.png

Hi, 

We run a cluster of 50 brokers, 1.4M msgs/sec at max, on AWS. We were using 
m4.2xlarge. We are now moving to m5.2xlarge. Everytime we replace a broker from 
scratch (EBSs are linked to ec2 instance..), the byte sent on the replaced 
broker increase significantly and that seem to impact the cluster, increasing 
the produce time and fetch time..

This is our configuration per broker :

 

 
{code:java}
broker.id=11
############################# Socket Server Settings 
#############################
# The port the socket server listens on
port=9092

advertised.host.name=ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com
# The number of threads handling network requests
num.network.threads=32
# The number of threads doing disk I/O
num.io.threads=16socket server socket.receive.buffer.bytes=1048576 

socket.request.max.bytes=104857600 # The max time a connection can be idle 
connections.max.idle.ms = 60000 

num.partitions=2 

default.replication.factor=2 

auto.leader.rebalance.enable=true 

delete.topic.enable=true 

compression.type=producer 

log.message.format.version=0.9.0.1


message.max.bytes=8000000 
# The minimum age of a log file to be eligible for deletion 
log.retention.hours=48 

log.retention.bytes=3000000000 

log.segment.bytes=268435456 

log.retention.check.interval.ms=60000  

log.cleaner.enable=true 

log.cleaner.dedupe.buffer.size=268435456

replica.fetch.max.bytes=8388608 

replica.fetch.wait.max.ms=500 

replica.lag.time.max.ms=10000 

num.replica.fetchers = 3 

# Auto creation of topics on the server 
auto.create.topics.enable=true 

controlled.shutdown.enable=true 

inter.broker.protocol.version=0.10.2 

unclean.leader.election.enabled=True
{code}
 

This is what we notice on replication :

I high increase in byte received on the replaced broker

 

!image-2019-08-13-10-26-19-282.png!

!image-2019-08-13-10-28-42-337.png!

You can't see it the graph above but the increase in produce time stayed high 
for 20minutes..

We didn't see anything out of the ordinary in the logs.

Please let us know if there is anything wrong in our config or if it is a 
potential issue that needs fixing with kafka. 

Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to