yugeeklab opened a new pull request, #8207: URL: https://github.com/apache/paimon/pull/8207
### Purpose Linked issue: close #8205 `PaimonMicroBatchStream#planInputPartitions` clamped the checkpointed start offset up to `initOffset` whenever it compared lower. `initOffset` is recomputed from the current table state on every restart, so with scan modes like `latest-full` it always points at the current snapshot with `scanSnapshot=true`. Any restarted query therefore dropped its valid checkpointed position, silently skipped the changelog gap, and re-emitted the entire snapshot as `+I` rows. This PR falls back to `initOffset` only when the checkpointed snapshot has actually expired (older than `earliestSnapshotId`); otherwise the query resumes from the checkpointed offset as-is. A warning is logged when the fallback is taken. ### Tests Verified against a production 5-minute-trigger streaming query (~300k-row source): with the fix the first batch after a restart resumes from the checkpointed offset and reads only the downtime changelog; without it, restarts produced one empty batch followed by a full-snapshot re-emission (offset WAL evidence in #8205). I did not find an existing harness for restart simulation of `PaimonMicroBatchStream` unit-side; happy to add one if maintainers can point at a preferred pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
