created ticket for the same -
https://issues.apache.org/jira/browse/KAFKA-20685

On Thu, Jun 11, 2026 at 10:35 PM esiva <[email protected]> wrote:

> Hi all,
>
> I was crash testing a Kafka Streams 4.1.2 app (exactly_once_v2, default
> state updater, persistent RocksDB stores, commit.interval.ms=500, running
> on Kubernetes StatefulSets with persistent volumes) and found behavior that
> doesn't match my understanding of the EOS state store model. Wanted a
> sanity check from people who know the internals before filing a JIRA. Also
> my JIRA account request is pending on selfserve (username: zoro3010200,
> email: [email protected]), if a PMC member can approve it that
> would help a lot.
>
> What I observed (reproduced twice):
>
> 1. After restoration completes, a .checkpoint file exists in the task
> directory and stays there unchanged for the whole RUNNING session. Frozen
> restore-time offsets, mtime never moves even while processing around 16k
> rec/s.
> 2. SIGKILL during active processing, zero grace period. On restart the
> instance finds that stale checkpoint, logs "State store X initialized from
> checkpoint with offset ...", no TaskCorruptedException, no wipe, and just
> replays the changelog tail.
> 3. The wipe path only fired in a different test where the crash happened
> during restoration itself, before any checkpoint entry existed.
>
> Code paths I traced on the 4.1 branch:
>
> - DefaultStateUpdater.maybeCompleteRestoration calls
> task.maybeCheckpoint(true) on restore completion with no EOS condition.
> - Nothing deletes that file on the restored to RUNNING transition.
> StreamTask.completeRestoration writes its own checkpoint only if
> !eosEnabled, which makes me think the intent was that no checkpoint should
> exist past that point under EOS.
> - During RUNNING under EOS, postCommit only checkpoints when
> enforceCheckpoint=true, so the file never gets refreshed or removed in
> steady state.
> - The only delete sites I could find: init time
> (ProcessorStateManager.initializeStoreOffsetsFromCheckpoint), resume from
> SUSPENDED (KAFKA-10362), and removeCheckpointForCorruptedTask.
>
> Why I think this matters: RocksDBStore disables the WAL and Streams
> doesn't flush on commit under EOS, so a background memtable flush landing
> mid transaction can persist writes from a transaction that later gets
> aborted. The tail replay runs read_committed and skips those aborted
> records, so it can't clean them up. Reprocessing of the uncommitted input
> offsets papers over it for deterministic input driven writes, but state
> written by wall clock punctuators never gets regenerated, and
> read-before-write logic (dedup on an ID for example) can read the leftover
> value and silently skip a redelivered record. The KIP-892 motivation
> section says EOS has to wipe task state on crash because data gets written
> to the store before the changelog commit completes, and this behavior
> bypasses that wipe.
>
> KAFKA-10362 fixed what looks like the same class of issue on the resume
> path with an explicit deleteCheckPointFileIfEOSEnabled(). The restored to
> RUNNING transition seems to be missing the same delete ever since the state
> updater became the default.
>
> So my questions: is this known? Intended? Is there some safety argument
> I'm missing? If it's a real gap I'm happy to file the JIRA with full logs
> and disk evidence from the crash tests, and help test a fix.
>
> Thanks,
>
>

Reply via email to