DanielLeens commented on issue #10899: URL: https://github.com/apache/seatunnel/issues/10899#issuecomment-4483713296
Thanks for the detailed config and logs. I traced the current MySQL CDC recovery path in `dev`, and this does look like a real bug in the `timestamp` startup flow rather than expected checkpoint behavior. The key problem is that SeaTunnel does keep updating the incremental split offset during runtime / checkpointing, but the MySQL reader does not reuse that restored split offset when `startup.mode = timestamp`: - `IncrementalSourceRecordEmitter` keeps advancing the incremental split offset as schema-change, data-change, and heartbeat records are emitted; - `IncrementalSplitState.toSourceSplit()` persists that updated offset into checkpoint state for restore; - but `MySqlSourceFetchTaskContext#getInitOffset(...)` special-cases `StartupMode.TIMESTAMP` and always re-runs `findBinlogOffsetBytimestamp(...)` from the original `startup.timestamp`, instead of using the restored split offset; - `MySqlBinlogFetchTask` then starts again with the timestamp filter, which matches the recovery log you shared. So under the normal recovery path, `startup.mode = timestamp` is currently not honoring the checkpointed binlog offset. If the original timestamp maps to a purged or corrupted binlog file, recovery can fall back onto an unavailable historical position even though a newer checkpointed offset existed before the failure. This issue is worth keeping open as a real bug in MySQL CDC checkpoint recovery semantics for the `timestamp` startup mode. As a temporary workaround, please do not rely on `startup.mode = timestamp` for long-running jobs that need checkpoint-based resume. In the current implementation, that mode is safer as a bootstrap-only start strategy than as a recovery anchor. If recovery is required after purge/corruption, you may need to restart from a fresh snapshot or from another valid binlog position. A focused fix would likely need to make the restored incremental split offset take priority over the original `startup.timestamp` during recovery, while keeping timestamp-based bootstrap only for the very first start. Thanks again for the precise report. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
