zoro30102000 opened a new pull request, #22548: URL: https://github.com/apache/kafka/pull/22548
Under exactly_once_v2 the default state updater writes an enforced checkpoint when a task finishes restoration (DefaultStateUpdater.maybeCompleteRestoration) and nothing removes that file when the task transitions to RUNNING. StreamTask.completeRestoration only skips writing its own checkpoint under EOS, which suggests the intent was that no checkpoint should exist past that point. The leftover file survives the whole processing session, so an unclean crash during RUNNING finds it on restart, initializes the stores from it and skips the TaskCorruptedException wipe. RocksDB runs with the WAL disabled and Streams does not flush on commit under EOS, so background memtable flushes can persist writes from a transaction that is later aborted, and the read_committed tail replay cannot remove them. This change deletes the checkpoint file when an EOS task completes restoration, before the transition to RUNNING, restoring the invariant that no checkpoint exists during RUNNING under EOS. It mirrors what KAFKA-10362 did on the resume path with the same helper. The checkpoints written during restoration itself are untouched, so a crash during restoration still resumes from them, which is safe because restored bytes are committed changelog data. Trunk and 4.3 are not affected: stores that manage their own offsets detect an unclean shutdown through the KIP-1035 status marker, and KAFKA-19712 moved the legacy checkpoint file handling into LegacyCheckpointingStateStore, which writes the file only under at_least_once and deletes it at init under EOS. This PR therefore targets the 4.2 branch. Testing: three unit tests added to StreamTaskTest next to the existing restoration checkpoint tests. The EOS test fails without the fix, the at_least_once test verifies the delete does not run there, and a third test verifies a failed delete does not prevent the transition to RUNNING. Full StreamTaskTest, checkstyle and spotbugs pass on the streams module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
