Bill Bejeck created KAFKA-20663:
-----------------------------------
Summary: KIP-1035: stale persisted changelog offset causes
OffsetOutOfRangeException/TaskCorruptedException on restart
Key: KAFKA-20663
URL: https://issues.apache.org/jira/browse/KAFKA-20663
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 4.3.0, 4.3.1, 4.4.0
Reporter: Bill Bejeck
Assignee: Bill Bejeck
Fix For: 4.3.1
In 4.3, KIP-1035 moved the changelog offset into RocksDB and removed the forced
flush on commit, so the persisted offset is now only made durable by an organic
memtable flush or a clean close. When that offset goes stale — after an unclean
exit, or a clean shutdown followed by changelog truncation/compaction while the
instance is down — and the changelog log-start offset has advanced past it, the
restore consumer seeks out of range and throws OffsetOutOfRangeException, which
Streams converts to a TaskCorruptedException (full local-state wipe and
rebuild). This happens far more often than in 4.2 (where the forced flush kept
the offset within roughly commit.interval.ms), affecting both at-least-once and
exactly-once and hitting windowed/segmented stores hardest.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)