Great change for the config name. The new one is much better.
-Matthias
On 2/13/26 1:56 PM, Uladzislau Blok wrote:
Yes, I agree with that. I have changed a KIP accordingly.
One small details I want to fix is naming for new property:
The proposal was: state.cleanup.on.start.delay.ms , but at the same time is
not 'delay' is more 'max age' of directory
I propose a bit different naming: state.cleanup.dir.max.age.ms
This naming, from my perspective, better describes new functionality, but
at the same time shows the relation to 'state.cleanup.delay.ms'
KIP is updated
On Fri, Feb 13, 2026 at 10:22 PM Matthias J. Sax <[email protected]> wrote:
One more thing came to mind. We should also define the "importance
level" of the new config.
Given that's it's kind of an edge case, maybe "low" would be the right
pick? Thoughts?
The KIP should include this information.
-Matthias
On 2/13/26 11:56 AM, Matthias J. Sax wrote:
Thanks for updating the KIP. LGTM
I don't have any further comments. While we wait for others to chime in,
too, I think you can start a vote in parallel.
-Matthias
On 1/28/26 4:42 AM, Uladzislau Blok wrote:
Hello Matthias,
Thank you for the feedback.
I really like the proposal to change state.cleanup.on.start from a
boolean
to a long (with a default of -1). Do we need to change naming then?
Proposal: state.cleanup.on.start.delay.ms
Decoupling this from state.cleanup.delay.ms ensures the new feature
doesn't
have unintended side effects. It also gives users the flexibility to
align
the cleanup threshold with their delete.retention.ms settings. For
example,
if the retention is set to 24 hours, a user could safely set the cleanup
property to 20 hours (or even closer to retention value)
Regarding the global store case, I believe this approach helps there as
well. Even if a less-frequently updated global store is wiped, it would
only occur according to the specific threshold the user has defined,
which
is a manageable trade-off.
I have updated the KIP accordingly.
Best regards,
Uladzislau Blok
On Tue, Jan 27, 2026 at 8:19 AM Matthias J. Sax <[email protected]>
wrote:
Thanks for raising both points.
The global store one is tricky. Not sure atm. The good thing is of
course, that this new feature is disable by default. Maybe it would be
sufficient to call out this edge case in the docs explicitly, calling
for caution, but leave it up the user to decide? -- Maybe others have
some ideas?
About increasing `state.cleanup.delay.ms` -- I am not convinced it
would
be a good idea. I would propose two alternatives.
- extend the doc to tell users to consider increasing this config,
if
they use this new feature
- change `state.cleanup.on.start` from a boolean to a long, with
default value `-1` (for disabled) and let users decide what age
threshold they want to apply when enabling the feature, effectively
decoupling the new feature from `state.cleanup.delay.ms` config.
Thoughts?
-Matthias
On 1/18/26 11:01 AM, Uladzislau Blok wrote:
Hello Matthias,
Thanks for the feedback on the KIP.
It seems we had a slight misunderstanding regarding the cleanup logic,
but
after revisiting the ticket and the existing codebase, your
suggestion to
wipe stores older than state.cleanup.delay.ms makes perfect sense. I
have
updated the KIP accordingly, and it is now ready for a second round of
review.
I would like to highlight two specific points for further discussion:
-
This proposal might cause global stores to be deleted if they
aren't
updated often. Currently, we check the last modification time
of the
directory. If a global table hasn't changed, it might be
cleaned up
even if
the data is still valid. However, since these tables are usually
small,
this might not be a major issue. What do you think?
-
We previously discussed increasing the default value for
state.cleanup.delay.ms to be less aggressive. Do we have any
consensus
on a reasonable default, or a recommended methodology for
measuring
what
this value should be?
Regards,
Uladzislau Blok.
On Mon, Jan 12, 2026 at 2:55 AM Matthias J. Sax <[email protected]>
wrote:
Thanks for the KIP Uladzislau.
Given that you propose to wipe the entire state if this config is
set, I
am wondering if we would need such a config to begin with, or if
users
could implement this themselves (via some custom config the
application
code uses) and calls `KafkaStreams#cleanUp()` to wipe out all local
state if this custom config is set?
I believe to remember from the original ticket discussion, that the
idea
was not to blindly wipe the entire state, but to do it still based on
task directory age, similar to what the background cleaner thread
does
(based on `state.cleanup.delay.ms` config). And to trigger a cleanup
run
before startup. Thoughts?
-Matthias
On 12/21/25 6:37 AM, Uladzislau Blok wrote:
Hi everyone,
I'd like to start a discussion on *KIP-1259: Add configuration to
wipe
local state on startup*.
Problem
Currently, Kafka Streams can encounter a "zombie data" issue when an
instance restarts using stale local files after a period exceeding
the
changelog topic's delete.retention.ms. If the local checkpoint
offset
is
still within the broker's available log range (due to long-lived
entities),
an automatic reset isn't triggered. However, since the broker has
already
purged deletion tombstones, the state store is rehydrated without
the
"delete" instructions, causing previously deleted entities to
unexpectedly
reappear in the local RocksDB.
Proposed Solution
I propose introducing a new configuration, state.cleanup.on.start
(Boolean,
default: false). When enabled, this property forces the deletion
of all
local state directories and checkpoint files during application
initialization. This ensures the state is rebuilt entirely from the
changelog—the broker's "source of truth"—effectively purging any
expired
zombie records.
This is particularly useful for environments with persistent volumes
where
instances might remain dormant for long periods (e.g., multi-region
failover).
*KIP Link: *
https://cwiki.apache.org/confluence/display/KAFKA/
KIP-1259%3A+Add+configuration+to+wipe+Kafka+Streams+local+state+on+startup
I look forward to your feedback and suggestions.
Best regards,
Uladzislau Blok