hello! There was some work done on KIP-723 to propose to avoid setting
TCP_NODELAY on all sockets. The metrics provided at the time were showing a
great reduction of packets per second in one usecase. I am running Kafka on
Kubernetes and see some of the nodes getting overloaded with the amount of
packets per second they have to process. Many of these packets are tiny
responses sent in sequence, coming from a Kafka broker (similar to the first
capture I paste below from the reproducer). I too would wish to see a 5000%
reduction of these packets!

The last question from the discussion from 4 years ago read:

> In order to figure out why you are getting so many small writes to the
> socket, it would help to know a bit more about your configuration. You are
> using SSL, right? I think if you used plaintext, sendfile would avoid this
> issue. What kind of disks are you using?  We may find that increasing
> buffering in the SSL layer would avoid this issue without the drawbacks of
> enabling Nagle's algorithm.

pull request https://github.com/apache/kafka/pull/10333 was later abandoned.

I managed to reproduce the "many small writes" on the socket using the docker
image provided on Dockerhub, and without using TLS, instructions below. tcpdump
shows that 1 Kafka response is splitted over many small packets:

18:25:20.658080 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
126699:126773, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 74
18:25:20.658260 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
126773:127443, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 670
18:25:20.658298 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
127443:127499, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 56
18:25:20.658326 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
127499:128169, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 670
18:25:20.658386 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
128169:128225, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 56
18:25:20.658418 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
128225:128955, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 730
18:25:20.658433 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
128955:128958, ack 10506, win 50470, options [nop,nop,TS val
2935541743 ecr 3641178240], length 3
18:25:20.659021 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [.], ack
128958, win 40523, options [nop,nop,TS val 3641188245 ecr 2935541743],
length 0
18:25:20.661747 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [P.], seq
10506:10715, ack 128958, win 41652, options [nop,nop,TS val 3641188247
ecr 2935541743], length 209
18:25:20.661775 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], ack
10715, win 50470, options [nop,nop,TS val 2935541746 ecr 3641188247],
length 0


now attaching to the Kafka broker with gdb and forcing the TCP_NODELAY to 0
again using

    set $val = (int*)malloc(4)
    p *$val=0
    p (int) setsockopt(202, 6, 1, $val, 4)

we see the traffic is much more efficient:

18:27:41.227352 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
159445:159519, ack 13467, win 50470, options [nop,nop,TS val
2935682312 ecr 3641318810], length 74
18:27:41.227637 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], seq
159519:160967, ack 13467, win 50470, options [nop,nop,TS val
2935682312 ecr 3641318810], length 1448
18:27:41.227710 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [.], ack
160967, win 40922, options [nop,nop,TS val 3641328813 ecr 2935682312],
length 0
18:27:41.227740 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [P.], seq
160967:161704, ack 13467, win 50470, options [nop,nop,TS val
2935682312 ecr 3641328813], length 737
18:27:41.230141 IP 10.90.0.1.54994 > 10.90.0.10.9092: Flags [P.], seq
13467:13676, ack 161704, win 41652, options [nop,nop,TS val 3641328816
ecr 2935682312], length 209
18:27:41.270965 IP 10.90.0.10.9092 > 10.90.0.1.54994: Flags [.], ack
13676, win 50470, options [nop,nop,TS val 2935682356 ecr 3641328816],
length 0

I think this demonstrates that Kafka does many small writes to the socket even
without TLS. Do you think there is something obviously wrong that would
explain this?


~~~~~
Running the reproducer below on a centos stream10 VM:

# broker
sudo podman network create --subnet 10.90.0.0/26 --interface-name
kafka0 kafkanet
sudo podman run --entrypoint cat apache/kafka:latest
/etc/kafka/docker/server.properties > server.properties
sed -i 
's@^advertised.listeners=.*@advertised.listeners=PLAINTEXT://10.90.0.10:9092@'
server.properties
sudo podman run --network kafkanet --ip 10.90.0.10 -p 9092:9092 -p
9093:9093 --rm -d --name broker -v
`pwd`/server.properties:/etc/kafka/docker/server.properties:Z
apache/kafka:latest
for i in `seq 3` ; do sudo podman exec --workdir /opt/kafka/bin/
broker ./kafka-topics.sh --bootstrap-server localhost:9092 --create
--topic test-topic-$i ; done

# producer1:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod1 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-1"
# producer2:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod2 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-2"
# producer3:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod3 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-3"


# consumer, I'm running that as my user
podman run --network host --rm -ti  --workdir /opt/kafka/bin/ --name
client --entrypoint bash apache/kafka:latest
./kafka-console-consumer.sh --bootstrap-server 10.90.0.10:9092
--include test-topic-. --consumer-property fetch.max.wait.ms=10000
--consumer-property heartbeat.interval.ms=20000 --consumer-property
session.timeout.ms=60000 --consumer-property fetch.min.bytes=50000



Cheers!
Francois

Reply via email to