Igor Baltiyskiy created KAFKA-9912: -------------------------------------- Summary: Kafka deletes full log on startup written 30mins ago - wrongly claims retention time 7d breach Key: KAFKA-9912 URL: https://issues.apache.org/jira/browse/KAFKA-9912 Project: Kafka Issue Type: Bug Components: log cleaner Affects Versions: 2.1.1 Environment: Image solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1 deployed in Kubernetes v 1.12.7 (minikube v1.4.0) 1 broker Reporter: Igor Baltiyskiy Attachments: kafka-logs.txt
This isn't easily reproduced, so bear with me. I was experimenting with the resilience of my Kafka-based app by shutting down Kafka via {{kubectl scale --replicas 0}} — as far as I understand, it sends {{SIGTERM}} to PID 1 of this container, which is {{java ... kafka.Kafka}}. I've got it into state where {{kafka-console-consumer.sh}} cannot read anything from some topics, but can from some others. So, this works: {code:java} bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-0 --partition 0 --from-beginning --max-messages 5 1 2 3 4 5 Processed a total of 5 messages {code} And this command just hangs: {code:java} bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-7 --partition 0 --from-beginning --max-messages 5 {code} Even though I know there should be plenty of messages in the topic, produced not more than an hour ago: {code:java} root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-0 increasing-topic-0:0:500000 root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-7 increasing-topic-7:0:321455 {code} What I see in the broker logs: {code:java} [2020-04-23 19:20:10,706] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Found deletable segments with base offsets [0] due to retention time 604800000ms breach (kafka.log.Log) [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 321455 in 5 ms. (kafka.log.Log) [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 0, size 4513814] for deletion. (kafka.log.Log) [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Incrementing log start offset to 321455 (kafka.log.Log) [2020-04-23 19:21:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log) [2020-04-23 19:21:10,713] INFO Deleted log /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.log.deleted. (kafka.log.LogSegment) [2020-04-23 19:21:10,713] INFO Deleted offset index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.index.deleted. (kafka.log.LogSegment) [2020-04-23 19:21:10,713] INFO Deleted time index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment){code} As you see, Kafka deletes up to 321455 messages — exactly how many there are in this topic. I wouldn't expect Kafka to do that. Note that in my test, I ran a script that would switch Kafka on and then after a random period of time off. Maybe that was what caused this bug to happen. For the record, that's how much timed passed between on/off switches: {code:java} sleeping 29575 ms before switching Kafka off statefulset.apps/kafka-a scaled Kafka switched off sleeping 257347 ms before switching Kafka on statefulset.apps/kafka-a scaled kafka switched on sleeping 6388 ms before switching Kafka off statefulset.apps/kafka-a scaled Kafka switched off sleeping 214066 ms before switching Kafka on statefulset.apps/kafka-a scaled kafka switched on sleeping 1095 ms before switching Kafka off statefulset.apps/kafka-a scaled Kafka switched off sleeping 155721 ms before switching Kafka on statefulset.apps/kafka-a scaled kafka switched on sleeping 8689 ms before switching Kafka off statefulset.apps/kafka-a scaled Kafka switched off sleeping 188854 ms before switching Kafka on statefulset.apps/kafka-a scaled kafka switched on sleeping 39113 ms before switching Kafka off{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)