GabrielBBaldez opened a new pull request, #11053:
URL: https://github.com/apache/seatunnel/pull/11053

   ### Purpose of this pull request
   
   Closes #11051. Related user report: #9511.
   
   Adds an explicit `latest` startup mode to `MongoDB-CDC`, so a job can skip 
the initial snapshot and consume only changes made after it starts.
   
   ### What changed
   
   **Option (`MongodbIncrementalSourceOptions`)**
   - `startup.mode` now accepts `latest` (choices: `initial`, `latest`, 
`timestamp` — was `initial`, `timestamp`).
   - Fixed the stale description, which advertised `earliest`/`specific` values 
the connector never accepted; it now describes exactly what each supported mode 
does.
   
   **Why no runtime changes were needed**
   
   The cdc-base runtime already implements the requested semantics — this 
connector just never exposed the mode:
   
   - `IncrementalSource#createEnumerator` routes any non-`INITIAL` mode to 
`IncrementalSplitAssigner` (stream-only), so the snapshot/copy phase is skipped 
entirely.
   - `StartupConfig#getStartupOffset(LATEST)` resolves the start position 
through `ChangeStreamOffsetFactory#latest()`, which MongoDB-CDC already 
implements (current change-stream position).
   - Restore goes through the same checkpoint path: an `IncrementalPhaseState` 
checkpoint restores into `IncrementalSplitAssigner`, so a restarted job resumes 
from the checkpointed change-stream position and never falls back into a 
snapshot.
   
   **Tests (`MongodbIncrementalSourceFactoryTest`)**
   - Updated the option-rule test to assert the supported startup modes are 
exactly `initial`, `latest`, `timestamp`.
   - New test proving `StartupMode.LATEST` resolves through 
`ChangeStreamOffsetFactory` to a `ChangeStreamOffset` positioned at the current 
time (timestamp set, no resume token) — i.e. "new changes only".
   
   **Docs (EN + ZH)**
   - Documented `startup.mode` and `startup.timestamp` in the options table 
(they were previously undocumented).
   - Added a "Startup Mode" section explaining the three modes, 
checkpoint/restore behavior, and a `startup.mode = "latest"` example job.
   
   ### Scope notes (matching the issue's non-goals)
   
   - No dynamic newly-added collection discovery, no new metadata keys.
   - `earliest` is intentionally **not** exposed: 
`ChangeStreamOffsetFactory#earliest()` currently returns the current timestamp 
(same as `latest`), so advertising it would be misleading.
   
   ### Verification
   
   - `mvn test -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb` 
— 17/17 passing (JDK 11).
   - `mvn spotless:apply` clean.
   
   ### Check list
   
   * [x] Code changed are covered with tests, or it does not need tests for 
reason
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according [New License 
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [x] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/seatunnel/tree/dev/docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to