Marouane RAJI created KAFKA-8796: ------------------------------------ Summary: A broker joining the cluster should be able to replicate without impacting the cluster Key: KAFKA-8796 URL: https://issues.apache.org/jira/browse/KAFKA-8796 Project: Kafka Issue Type: Bug Affects Versions: 1.1.0 Reporter: Marouane RAJI Attachments: image-2019-08-13-10-26-19-282.png, image-2019-08-13-10-28-42-337.png
Hi, We run a cluster of 50 brokers, 1.4M msgs/sec at max, on AWS. We were using m4.2xlarge. We are now moving to m5.2xlarge. Everytime we replace a broker from scratch (EBSs are linked to ec2 instance..), the byte sent on the replaced broker increase significantly and that seem to impact the cluster, increasing the produce time and fetch time.. This is our configuration per broker : {code:java} broker.id=11 ############################# Socket Server Settings ############################# # The port the socket server listens on port=9092 advertised.host.name=ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com # The number of threads handling network requests num.network.threads=32 # The number of threads doing disk I/O num.io.threads=16socket server socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 # The max time a connection can be idle connections.max.idle.ms = 60000 num.partitions=2 default.replication.factor=2 auto.leader.rebalance.enable=true delete.topic.enable=true compression.type=producer log.message.format.version=0.9.0.1 message.max.bytes=8000000 # The minimum age of a log file to be eligible for deletion log.retention.hours=48 log.retention.bytes=3000000000 log.segment.bytes=268435456 log.retention.check.interval.ms=60000 log.cleaner.enable=true log.cleaner.dedupe.buffer.size=268435456 replica.fetch.max.bytes=8388608 replica.fetch.wait.max.ms=500 replica.lag.time.max.ms=10000 num.replica.fetchers = 3 # Auto creation of topics on the server auto.create.topics.enable=true controlled.shutdown.enable=true inter.broker.protocol.version=0.10.2 unclean.leader.election.enabled=True {code} This is what we notice on replication : I high increase in byte received on the replaced broker !image-2019-08-13-10-26-19-282.png! !image-2019-08-13-10-28-42-337.png! You can't see it the graph above but the increase in produce time stayed high for 20minutes.. We didn't see anything out of the ordinary in the logs. Please let us know if there is anything wrong in our config or if it is a potential issue that needs fixing with kafka. Thanks. -- This message was sent by Atlassian JIRA (v7.6.14#76016)