Re: [DISCUSS] KIP-1259: Add configuration to wipe Kafka Streams local state on startup

Matthias J. Sax Fri, 13 Feb 2026 13:22:01 -0800

One more thing came to mind. We should also define the "importancelevel" of the new config.

Given that's it's kind of an edge case, maybe "low" would be the rightpick? Thoughts?


The KIP should include this information.


-Matthias

On 2/13/26 11:56 AM, Matthias J. Sax wrote:

Thanks for updating the KIP. LGTM

I don't have any further comments. While we wait for others to chime in,too, I think you can start a vote in parallel.



-Matthias

On 1/28/26 4:42 AM, Uladzislau Blok wrote:

Hello Matthias,

Thank you for the feedback.

I really like the proposal to change state.cleanup.on.start from aboolean

to a long (with a default of -1). Do we need to change naming then?
Proposal: state.cleanup.on.start.delay.ms

Decoupling this from state.cleanup.delay.ms ensures the new featuredoesn'thave unintended side effects. It also gives users the flexibility toalignthe cleanup threshold with their delete.retention.ms settings. Forexample,

if the retention is set to 24 hours, a user could safely set the cleanup
property to 20 hours (or even closer to retention value)

Regarding the global store case, I believe this approach helps there as
well. Even if a less-frequently updated global store is wiped, it would

only occur according to the specific threshold the user has defined,which

is a manageable trade-off.

I have updated the KIP accordingly.

Best regards,
Uladzislau Blok

On Tue, Jan 27, 2026 at 8:19 AM Matthias J. Sax <[email protected]> wrote:

Thanks for raising both points.

The global store one is tricky. Not sure atm. The good thing is of
course, that this new feature is disable by default. Maybe it would be
sufficient to call out this edge case in the docs explicitly, calling
for caution, but leave it up the user to decide? -- Maybe others have
some ideas?


About increasing `state.cleanup.delay.ms` -- I am not convinced it would
be a good idea. I would propose two alternatives.

   - extend the doc to tell users to consider increasing this config, if
they use this new feature

   - change `state.cleanup.on.start` from a boolean to a long, with
default value `-1` (for disabled) and let users decide what age
threshold they want to apply when enabling the feature, effectively
decoupling the new feature from `state.cleanup.delay.ms` config.

Thoughts?


-Matthias

On 1/18/26 11:01 AM, Uladzislau Blok wrote:

Hello Matthias,

Thanks for the feedback on the KIP.

It seems we had a slight misunderstanding regarding the cleanup logic,

but

after revisiting the ticket and the existing codebase, yoursuggestion to
wipe stores older than state.cleanup.delay.ms makes perfect sense. I

have

updated the KIP accordingly, and it is now ready for a second round of
review.

I would like to highlight two specific points for further discussion:

     -
This proposal might cause global stores to be deleted if theyaren't updated often. Currently, we check the last modification timeof the directory. If a global table hasn't changed, it might becleaned up

even if

     the data is still valid. However, since these tables are usually

small,

     this might not be a major issue. What do you think?
     -

     We previously discussed increasing the default value for
     state.cleanup.delay.ms to be less aggressive. Do we have any

consensus

on a reasonable default, or a recommended methodology formeasuring

what

     this value should be?

Regards,
Uladzislau Blok.

On Mon, Jan 12, 2026 at 2:55 AM Matthias J. Sax <[email protected]>

wrote:

Thanks for the KIP Uladzislau.
Given that you propose to wipe the entire state if this config isset, I
am wondering if we would need such a config to begin with, or if users
could implement this themselves (via some custom config theapplication
code uses) and calls `KafkaStreams#cleanUp()` to wipe out all local
state if this custom config is set?
I believe to remember from the original ticket discussion, that theidea
was not to blindly wipe the entire state, but to do it still based on
task directory age, similar to what the background cleaner thread does
(based on `state.cleanup.delay.ms` config). And to trigger a cleanup

run

before startup. Thoughts?


-Matthias

On 12/21/25 6:37 AM, Uladzislau Blok wrote:
Hi everyone,
I'd like to start a discussion on *KIP-1259: Add configuration towipe
local state on startup*.
Problem

Currently, Kafka Streams can encounter a "zombie data" issue when an
instance restarts using stale local files after a period exceedingthe
changelog topic's delete.retention.ms. If the local checkpoint offset

is

still within the broker's available log range (due to long-lived

entities),

an automatic reset isn't triggered. However, since the broker has

already

purged deletion tombstones, the state store is rehydrated without the
"delete" instructions, causing previously deleted entities to

unexpectedly

reappear in the local RocksDB.
Proposed Solution

I propose introducing a new configuration, state.cleanup.on.start

(Boolean,

default: false). When enabled, this property forces the deletionof all

local state directories and checkpoint files during application
initialization. This ensures the state is rebuilt entirely from the
changelog—the broker's "source of truth"—effectively purging any

expired

zombie records.

This is particularly useful for environments with persistent volumes

where

instances might remain dormant for long periods (e.g., multi-region
failover).

*KIP Link: *

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1259%3A+Add+configuration+to+wipe+Kafka+Streams+local+state+on+startup



I look forward to your feedback and suggestions.


Best regards,

Uladzislau Blok

Re: [DISCUSS] KIP-1259: Add configuration to wipe Kafka Streams local state on startup

Reply via email to