created ticket for the same - https://issues.apache.org/jira/browse/KAFKA-20685
On Thu, Jun 11, 2026 at 10:35 PM esiva <[email protected]> wrote: > Hi all, > > I was crash testing a Kafka Streams 4.1.2 app (exactly_once_v2, default > state updater, persistent RocksDB stores, commit.interval.ms=500, running > on Kubernetes StatefulSets with persistent volumes) and found behavior that > doesn't match my understanding of the EOS state store model. Wanted a > sanity check from people who know the internals before filing a JIRA. Also > my JIRA account request is pending on selfserve (username: zoro3010200, > email: [email protected]), if a PMC member can approve it that > would help a lot. > > What I observed (reproduced twice): > > 1. After restoration completes, a .checkpoint file exists in the task > directory and stays there unchanged for the whole RUNNING session. Frozen > restore-time offsets, mtime never moves even while processing around 16k > rec/s. > 2. SIGKILL during active processing, zero grace period. On restart the > instance finds that stale checkpoint, logs "State store X initialized from > checkpoint with offset ...", no TaskCorruptedException, no wipe, and just > replays the changelog tail. > 3. The wipe path only fired in a different test where the crash happened > during restoration itself, before any checkpoint entry existed. > > Code paths I traced on the 4.1 branch: > > - DefaultStateUpdater.maybeCompleteRestoration calls > task.maybeCheckpoint(true) on restore completion with no EOS condition. > - Nothing deletes that file on the restored to RUNNING transition. > StreamTask.completeRestoration writes its own checkpoint only if > !eosEnabled, which makes me think the intent was that no checkpoint should > exist past that point under EOS. > - During RUNNING under EOS, postCommit only checkpoints when > enforceCheckpoint=true, so the file never gets refreshed or removed in > steady state. > - The only delete sites I could find: init time > (ProcessorStateManager.initializeStoreOffsetsFromCheckpoint), resume from > SUSPENDED (KAFKA-10362), and removeCheckpointForCorruptedTask. > > Why I think this matters: RocksDBStore disables the WAL and Streams > doesn't flush on commit under EOS, so a background memtable flush landing > mid transaction can persist writes from a transaction that later gets > aborted. The tail replay runs read_committed and skips those aborted > records, so it can't clean them up. Reprocessing of the uncommitted input > offsets papers over it for deterministic input driven writes, but state > written by wall clock punctuators never gets regenerated, and > read-before-write logic (dedup on an ID for example) can read the leftover > value and silently skip a redelivered record. The KIP-892 motivation > section says EOS has to wipe task state on crash because data gets written > to the store before the changelog commit completes, and this behavior > bypasses that wipe. > > KAFKA-10362 fixed what looks like the same class of issue on the resume > path with an explicit deleteCheckPointFileIfEOSEnabled(). The restored to > RUNNING transition seems to be missing the same delete ever since the state > updater became the default. > > So my questions: is this known? Intended? Is there some safety argument > I'm missing? If it's a real gap I'm happy to file the JIRA with full logs > and disk evidence from the crash tests, and help test a fix. > > Thanks, > >
