yzeng1618 commented on issue #10574:
URL: https://github.com/apache/seatunnel/issues/10574#issuecomment-4020702367

   Thanks for the detailed report.
   
   Based on code analysis, this looks like a real restore-path issue in the 
Kafka source, rather than only a JDBC sink limitation.
   
   Preliminary conclusion:
   1. The non-XA JDBC sink (`is_exactly_once=false`) does not keep recoverable 
writer state, so end-to-end exactly-once is not guaranteed in this 
configuration.
   2. However, the more critical problem is in Kafka source recovery:
      - the reader snapshots the current split offset into checkpoint state;
      - but during restore, the Kafka enumerator forces restored startup mode 
to `GROUP_OFFSETS`;
      - and `addSplitsBack()` rewrites the restored split start offset via 
`endOffset + 1`, which is not correct for failover recovery.
   3. As a result, after recovery the source may seek to a later offset than 
the last completed checkpoint, which can explain the skipped range you observed.
   
   Relevant code paths:
   - `KafkaRecordEmitter` updates current offset
   - `KafkaSourceReader.snapshotState()` stores split offsets
   - `SourceFlowLifeCycle.restoreState()` sends restored splits back to 
enumerator
   - `KafkaSourceSplitEnumerator.addSplitsBack()` / restore logic rewrites the 
split start position
   - non-XA `JdbcSinkWriter.snapshotState()` returns empty state
   
   So at this point, my understanding is:
   - the JDBC sink configuration explains why EO cannot be guaranteed 
end-to-end;
   - but the observed offset jump is more likely caused by incorrect Kafka 
source restore semantics.
   
   Proposed fix direction:
   - preserve the checkpointed split start offset during restore;
   - do not force restored startup mode to `GROUP_OFFSETS`;
   - separate “split re-assignment after restore” from “bounded split 
move-to-next-range” logic.
   
   If needed, I can prepare a patch and add focused tests for restored split 
offset recovery.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to