GabrielBBaldez opened a new pull request, #11053: URL: https://github.com/apache/seatunnel/pull/11053
### Purpose of this pull request Closes #11051. Related user report: #9511. Adds an explicit `latest` startup mode to `MongoDB-CDC`, so a job can skip the initial snapshot and consume only changes made after it starts. ### What changed **Option (`MongodbIncrementalSourceOptions`)** - `startup.mode` now accepts `latest` (choices: `initial`, `latest`, `timestamp` — was `initial`, `timestamp`). - Fixed the stale description, which advertised `earliest`/`specific` values the connector never accepted; it now describes exactly what each supported mode does. **Why no runtime changes were needed** The cdc-base runtime already implements the requested semantics — this connector just never exposed the mode: - `IncrementalSource#createEnumerator` routes any non-`INITIAL` mode to `IncrementalSplitAssigner` (stream-only), so the snapshot/copy phase is skipped entirely. - `StartupConfig#getStartupOffset(LATEST)` resolves the start position through `ChangeStreamOffsetFactory#latest()`, which MongoDB-CDC already implements (current change-stream position). - Restore goes through the same checkpoint path: an `IncrementalPhaseState` checkpoint restores into `IncrementalSplitAssigner`, so a restarted job resumes from the checkpointed change-stream position and never falls back into a snapshot. **Tests (`MongodbIncrementalSourceFactoryTest`)** - Updated the option-rule test to assert the supported startup modes are exactly `initial`, `latest`, `timestamp`. - New test proving `StartupMode.LATEST` resolves through `ChangeStreamOffsetFactory` to a `ChangeStreamOffset` positioned at the current time (timestamp set, no resume token) — i.e. "new changes only". **Docs (EN + ZH)** - Documented `startup.mode` and `startup.timestamp` in the options table (they were previously undocumented). - Added a "Startup Mode" section explaining the three modes, checkpoint/restore behavior, and a `startup.mode = "latest"` example job. ### Scope notes (matching the issue's non-goals) - No dynamic newly-added collection discovery, no new metadata keys. - `earliest` is intentionally **not** exposed: `ChangeStreamOffsetFactory#earliest()` currently returns the current timestamp (same as `latest`), so advertising it would be misleading. ### Verification - `mvn test -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb` — 17/17 passing (JDK 11). - `mvn spotless:apply` clean. ### Check list * [x] Code changed are covered with tests, or it does not need tests for reason * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
