Shane created KAFKA-12378: ----------------------------- Summary: If a broker is down for more then `delete.retention.ms` deleted records in a compacted topic can come back. Key: KAFKA-12378 URL: https://issues.apache.org/jira/browse/KAFKA-12378 Project: Kafka Issue Type: Bug Reporter: Shane
If the leader of a compacted topic goes offline, or has replication lag longer than the `delete.retention.ms` of a topic, records that are tombstoned can come back once the leader catches up then becomes the leader. Example of this happening: Topic config: name: compacted-topic settings: delete.retention.ms=0 Leader: broker 1 ISR: broker 1, broker 2, broker 3 Producer 1 writes a record `1:foo` Producer 1 writes a record `2:bar` broker 1 goes offline broker 2 takes over leadership Producer 1 writes a tombstone `1:NULL` broker 2 compacts the topic, which leaves the topic with `1:NULL` and `2:bar` in it. broker 2 removes the tombstone leaving just `2:bar` in the topic. broker 1 comes back online, catches up with replication, takes back leadership broker 1 now has `1:foo` and `2:bar` as the data, since the tombstone is deleted At this point the topic is in a strange state, as the brokers have conflicting data. Suggestion: I believe this to be quite a hard problem to solve, so I'm not going to suggest any large changes to the codebase, but I think a warning in the docs about `delete.retention.ms` is warranted. I think adding something that calls out that brokers are also consumers here: [https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#topicconfigs_delete.retention.ms] would be helpful, but even further documentation about what happens when a broker is offline for more than `delete.retention.ms` would be nice to see. If it helps I'm happy to take a first draft at updating the docs as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)