yzeng1618 commented on issue #10574:
URL: https://github.com/apache/seatunnel/issues/10574#issuecomment-4020702367
Thanks for the detailed report.
Based on code analysis, this looks like a real restore-path issue in the
Kafka source, rather than only a JDBC sink limitation.
Preliminary conclusion:
1. The non-XA JDBC sink (`is_exactly_once=false`) does not keep recoverable
writer state, so end-to-end exactly-once is not guaranteed in this
configuration.
2. However, the more critical problem is in Kafka source recovery:
- the reader snapshots the current split offset into checkpoint state;
- but during restore, the Kafka enumerator forces restored startup mode
to `GROUP_OFFSETS`;
- and `addSplitsBack()` rewrites the restored split start offset via
`endOffset + 1`, which is not correct for failover recovery.
3. As a result, after recovery the source may seek to a later offset than
the last completed checkpoint, which can explain the skipped range you observed.
Relevant code paths:
- `KafkaRecordEmitter` updates current offset
- `KafkaSourceReader.snapshotState()` stores split offsets
- `SourceFlowLifeCycle.restoreState()` sends restored splits back to
enumerator
- `KafkaSourceSplitEnumerator.addSplitsBack()` / restore logic rewrites the
split start position
- non-XA `JdbcSinkWriter.snapshotState()` returns empty state
So at this point, my understanding is:
- the JDBC sink configuration explains why EO cannot be guaranteed
end-to-end;
- but the observed offset jump is more likely caused by incorrect Kafka
source restore semantics.
Proposed fix direction:
- preserve the checkpointed split start offset during restore;
- do not force restored startup mode to `GROUP_OFFSETS`;
- separate “split re-assignment after restore” from “bounded split
move-to-next-range” logic.
If needed, I can prepare a patch and add focused tests for restored split
offset recovery.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]