zoro30102000 opened a new pull request, #22548:
URL: https://github.com/apache/kafka/pull/22548

   Under exactly_once_v2 the default state updater writes an enforced 
checkpoint when a task finishes restoration 
(DefaultStateUpdater.maybeCompleteRestoration) and nothing removes that file 
when the task transitions to RUNNING. StreamTask.completeRestoration only skips 
writing its own checkpoint under EOS, which suggests the intent was that no 
checkpoint should exist past that point. The leftover file survives the whole 
processing session, so an unclean crash during RUNNING finds it on restart, 
initializes the stores from it and skips the TaskCorruptedException wipe. 
RocksDB runs with the WAL disabled and Streams does not flush on commit under 
EOS, so background memtable flushes can persist writes from a transaction that 
is later aborted, and the read_committed tail replay cannot remove them.
   
   This change deletes the checkpoint file when an EOS task completes 
restoration, before the transition to RUNNING, restoring the invariant that no 
checkpoint exists during RUNNING under EOS. It mirrors what KAFKA-10362 did on 
the resume path with the same helper. The checkpoints written during 
restoration itself are untouched, so a crash during restoration still resumes 
from them, which is safe because restored bytes are committed changelog data.
   
   Trunk and 4.3 are not affected: stores that manage their own offsets detect 
an unclean shutdown through the KIP-1035 status marker, and KAFKA-19712 moved 
the legacy checkpoint file handling into LegacyCheckpointingStateStore, which 
writes the file only under at_least_once and deletes it at init under EOS. This 
PR therefore targets the 4.2 branch.
   
   Testing: three unit tests added to StreamTaskTest next to the existing 
restoration checkpoint tests. The EOS test fails without the fix, the 
at_least_once test verifies the delete does not run there, and a third test 
verifies a failed delete does not prevent the transition to RUNNING. Full 
StreamTaskTest, checkstyle and spotbugs pass on the streams module.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to