GabrielBBaldez opened a new pull request, #11057:
URL: https://github.com/apache/seatunnel/pull/11057

   ### Purpose of this pull request
   
   Closes #11036.
   
   Adds a `snapshot` startup mode to `MySQL-CDC`: a bounded bootstrap job that 
reads the snapshot of the captured tables and then finishes on its own, without 
entering the incremental/binlog phase. Useful for one-time backfill, initial 
warehouse/table bootstrap, and controlled migration stages.
   
   ### What changed
   
   **Option surface (`MySqlIncrementalSourceOptions`)**
   - `startup.mode` now accepts `snapshot` (choices: `initial`, `snapshot`, 
`earliest`, `latest`, `specific`, `timestamp`), with the description explaining 
the bounded semantics.
   
   **Runtime (`connector-cdc-base`)**
   - `StartupMode` gains a `SNAPSHOT` constant; 
`StartupConfig.getStartupOffset` treats it like `INITIAL` (the snapshot phase 
needs no stream offset).
   - `HybridSplitAssigner` gains a `snapshotOnly` flag: the existing snapshot 
split planning/reading path is fully reused, but once the snapshot phase 
completes no incremental split is handed out and `waitingForCompletedSplits()` 
no longer considers the incremental assigner — so the enumerator's existing 
logic signals no-more-splits and the readers finish.
   - `IncrementalSource#getBoundedness` reports `BOUNDED` in snapshot mode, 
letting the job run as `BATCH` and finish naturally instead of idling in a 
streaming state.
   - Fail-fast validation at source creation for incompatible combinations: 
`startup.mode = snapshot` together with `stop.mode != never`, or with 
`startup.specific-offset.file` / `startup.specific-offset.pos` / 
`startup.timestamp`, is rejected with a clear message.
   
   **Checkpoint / finish semantics**
   - Snapshot-only jobs checkpoint through the unchanged 
`HybridPendingSplitsState` path, so checkpointing during the snapshot keeps 
working. On restore the assigner is rebuilt with the same snapshot-only flag 
(derived from config), so a restarted job resumes the snapshot phase instead of 
re-entering it or falling into streaming.
   
   **Tests**
   - `HybridSplitAssignerTest#testSnapshotOnlyFinishesAfterSnapshotPhase`: with 
a completed snapshot phase, the snapshot-only assigner returns no next split 
and is not waiting (job can finish), while the default hybrid behavior on the 
same state keeps waiting to hand out the incremental split.
   - `MySqlIncrementalSourceFactoryTest#testSupportedStartUpModes`: asserts the 
supported startup modes, including `snapshot`.
   - New e2e case `testMysqlCdcSnapshotOnlyStartupMode` (+ 
`mysqlcdc_snapshot_only.conf`, `BATCH` job): seeds the source table, runs the 
job synchronously and asserts it exits 0 on its own, asserts the sink equals 
the snapshot, then mutates the source after completion and asserts the sink is 
unchanged (no binlog consumption).
   
   **Docs (EN + ZH)**
   - `startup.mode` option row updated and a "Snapshot-only bootstrap" section 
added with a `BATCH` example and the incompatible-options note.
   
   ### Scope notes (matching the issue's non-goals)
   
   - No GTID-based startup, no skip-events/skip-rows trimming, no dynamic 
newly-added table capture, no schema evolution policy changes.
   
   ### Verification
   
   - `mvn install -pl connector-cdc-base,connector-cdc-mysql` — all module 
tests passing locally (JDK 11), including the new ones.
   - `mvn test-compile -pl connector-cdc-mysql-e2e` — e2e module compiles; the 
new IT runs in CI (needs Docker).
   - `mvn spotless:apply` clean.
   
   ### Check list
   
   * [x] Code changed are covered with tests, or it does not need tests for 
reason
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according [New License 
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [x] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to