DanielLeens commented on issue #10899:
URL: https://github.com/apache/seatunnel/issues/10899#issuecomment-4483713296

   Thanks for the detailed config and logs. I traced the current MySQL CDC 
recovery path in `dev`, and this does look like a real bug in the `timestamp` 
startup flow rather than expected checkpoint behavior.
   
   The key problem is that SeaTunnel does keep updating the incremental split 
offset during runtime / checkpointing, but the MySQL reader does not reuse that 
restored split offset when `startup.mode = timestamp`:
   
   - `IncrementalSourceRecordEmitter` keeps advancing the incremental split 
offset as schema-change, data-change, and heartbeat records are emitted;
   - `IncrementalSplitState.toSourceSplit()` persists that updated offset into 
checkpoint state for restore;
   - but `MySqlSourceFetchTaskContext#getInitOffset(...)` special-cases 
`StartupMode.TIMESTAMP` and always re-runs `findBinlogOffsetBytimestamp(...)` 
from the original `startup.timestamp`, instead of using the restored split 
offset;
   - `MySqlBinlogFetchTask` then starts again with the timestamp filter, which 
matches the recovery log you shared.
   
   So under the normal recovery path, `startup.mode = timestamp` is currently 
not honoring the checkpointed binlog offset. If the original timestamp maps to 
a purged or corrupted binlog file, recovery can fall back onto an unavailable 
historical position even though a newer checkpointed offset existed before the 
failure.
   
   This issue is worth keeping open as a real bug in MySQL CDC checkpoint 
recovery semantics for the `timestamp` startup mode.
   
   As a temporary workaround, please do not rely on `startup.mode = timestamp` 
for long-running jobs that need checkpoint-based resume. In the current 
implementation, that mode is safer as a bootstrap-only start strategy than as a 
recovery anchor. If recovery is required after purge/corruption, you may need 
to restart from a fresh snapshot or from another valid binlog position.
   
   A focused fix would likely need to make the restored incremental split 
offset take priority over the original `startup.timestamp` during recovery, 
while keeping timestamp-based bootstrap only for the very first start.
   
   Thanks again for the precise report.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to