Hi everyone,

I'd like to start a discussion on *KIP-1259: Add configuration to wipe
local state on startup*.
Problem

Currently, Kafka Streams can encounter a "zombie data" issue when an
instance restarts using stale local files after a period exceeding the
changelog topic's delete.retention.ms. If the local checkpoint offset is
still within the broker's available log range (due to long-lived entities),
an automatic reset isn't triggered. However, since the broker has already
purged deletion tombstones, the state store is rehydrated without the
"delete" instructions, causing previously deleted entities to unexpectedly
reappear in the local RocksDB.
Proposed Solution

I propose introducing a new configuration, state.cleanup.on.start (Boolean,
default: false). When enabled, this property forces the deletion of all
local state directories and checkpoint files during application
initialization. This ensures the state is rebuilt entirely from the
changelog—the broker's "source of truth"—effectively purging any expired
zombie records.

This is particularly useful for environments with persistent volumes where
instances might remain dormant for long periods (e.g., multi-region
failover).

*KIP Link: *
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1259%3A+Add+configuration+to+wipe+Kafka+Streams+local+state+on+startup


I look forward to your feedback and suggestions.


Best regards,

Uladzislau Blok

Reply via email to