hello! There was some work done on KIP-723 to propose to avoid setting TCP_NODELAY on all sockets. The metrics provided at the time were showing a great reduction of packets per second in one usecase. I am running Kafka on Kubernetes and see some of the nodes getting overloaded with the amount of packets per second they have to process. Many of these packets are tiny responses sent in sequence, coming from a Kafka broker (similar to the first capture I paste below from the reproducer). I too would wish to see a 5000% reduction of these packets!
The last question from the discussion from 4 years ago read: > In order to figure out why you are getting so many small writes to the > socket, it would help to know a bit more about your configuration. You are > using SSL, right? I think if you used plaintext, sendfile would avoid this > issue. What kind of disks are you using? We may find that increasing > buffering in the SSL layer would avoid this issue without the drawbacks of > enabling Nagle's algorithm. pull request https://github.com/apache/kafka/pull/10333 was later abandoned. I managed to reproduce the "many small writes" on the socket using the docker image provided on Dockerhub, and without using TLS, instructions below. tcpdump shows that 1 Kafka response is splitted over many small packets: 18:25:20.658080 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 126699:126773, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 74 18:25:20.658260 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 126773:127443, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 670 18:25:20.658298 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 127443:127499, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 56 18:25:20.658326 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 127499:128169, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 670 18:25:20.658386 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 128169:128225, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 56 18:25:20.658418 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 128225:128955, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 730 18:25:20.658433 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 128955:128958, ack 10506, win 50470, options [nop,nop,TS val 2935541743 ecr 3641178240], length 3 18:25:20.659021 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [.], ack 128958, win 40523, options [nop,nop,TS val 3641188245 ecr 2935541743], length 0 18:25:20.661747 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [P.], seq 10506:10715, ack 128958, win 41652, options [nop,nop,TS val 3641188247 ecr 2935541743], length 209 18:25:20.661775 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], ack 10715, win 50470, options [nop,nop,TS val 2935541746 ecr 3641188247], length 0 now attaching to the Kafka broker with gdb and forcing the TCP_NODELAY to 0 again using set $val = (int*)malloc(4) p *$val=0 p (int) setsockopt(202, 6, 1, $val, 4) we see the traffic is much more efficient: 18:27:41.227352 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 159445:159519, ack 13467, win 50470, options [nop,nop,TS val 2935682312 ecr 3641318810], length 74 18:27:41.227637 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], seq 159519:160967, ack 13467, win 50470, options [nop,nop,TS val 2935682312 ecr 3641318810], length 1448 18:27:41.227710 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [.], ack 160967, win 40922, options [nop,nop,TS val 3641328813 ecr 2935682312], length 0 18:27:41.227740 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq 160967:161704, ack 13467, win 50470, options [nop,nop,TS val 2935682312 ecr 3641328813], length 737 18:27:41.230141 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [P.], seq 13467:13676, ack 161704, win 41652, options [nop,nop,TS val 3641328816 ecr 2935682312], length 209 18:27:41.270965 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], ack 13676, win 50470, options [nop,nop,TS val 2935682356 ecr 3641328816], length 0 I think this demonstrates that Kafka does many small writes to the socket even without TLS. Do you think there is something obviously wrong that would explain this? ~~~~~ Running the reproducer below on a centos stream10 VM: # broker sudo podman network create --subnet 10.90.0.0/26 --interface-name kafka0 kafkanet sudo podman run --entrypoint cat apache/kafka:latest /etc/kafka/docker/server.properties > server.properties sed -i 's@^advertised.listeners=.*@advertised.listeners=PLAINTEXT://10.90.0.10:9092@' server.properties sudo podman run --network kafkanet --ip 10.90.0.10 -p 9092:9092 -p 9093:9093 --rm -d --name broker -v `pwd`/server.properties:/etc/kafka/docker/server.properties:Z apache/kafka:latest for i in `seq 3` ; do sudo podman exec --workdir /opt/kafka/bin/ broker ./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic test-topic-$i ; done # producer1: sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while sleep 1; do echo prod1 ; done | ./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic-1" # producer2: sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while sleep 1; do echo prod2 ; done | ./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic-2" # producer3: sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while sleep 1; do echo prod3 ; done | ./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic-3" # consumer, I'm running that as my user podman run --network host --rm -ti --workdir /opt/kafka/bin/ --name client --entrypoint bash apache/kafka:latest ./kafka-console-consumer.sh --bootstrap-server 10.90.0.10:9092 --include test-topic-. --consumer-property fetch.max.wait.ms=10000 --consumer-property heartbeat.interval.ms=20000 --consumer-property session.timeout.ms=60000 --consumer-property fetch.min.bytes=50000 Cheers! Francois