Bill Bejeck created KAFKA-20663:
-----------------------------------

             Summary: KIP-1035: stale persisted changelog offset causes 
OffsetOutOfRangeException/TaskCorruptedException on restart
                 Key: KAFKA-20663
                 URL: https://issues.apache.org/jira/browse/KAFKA-20663
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 4.3.0, 4.3.1, 4.4.0
            Reporter: Bill Bejeck
            Assignee: Bill Bejeck
             Fix For: 4.3.1


In 4.3, KIP-1035 moved the changelog offset into RocksDB and removed the forced 
flush on commit, so the persisted offset is now only made durable by an organic 
memtable flush or a clean close. When that offset goes stale — after an unclean 
exit, or a clean shutdown followed by changelog truncation/compaction while the 
instance is down — and the changelog log-start offset has advanced past it, the 
restore consumer seeks out of range and throws OffsetOutOfRangeException, which 
Streams converts to a TaskCorruptedException (full local-state wipe and 
rebuild). This happens far more often than in 4.2 (where the forced flush kept 
the offset within roughly commit.interval.ms), affecting both at-least-once and 
exactly-once and hitting windowed/segmented stores hardest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to