davidzollo opened a new issue, #11050: URL: https://github.com/apache/seatunnel/issues/11050
## Background SeaTunnel currently does not provide a native Vitess CDC source connector in the `connector-cdc` family. That leaves Vitess users without a first-class way to: - capture initial table state and incremental changes in one SeaTunnel source - participate in SeaTunnel checkpoint / restart semantics with a stable CDC position - feed multi-table CDC rows into downstream multi-table sinks and schema evolution flows There is historical discussion and even partial prototype work in the community, but the old issue is too stale to serve as a practical implementation ticket. This issue is intended to replace that with a claimable engineering scope. ## Scope Add a new `connector-cdc-vitess` source connector under `seatunnel-connectors-v2/connector-cdc`. This issue is for the **source connector only**. ## First delivery boundary To keep the issue implementable, the first delivery should stay narrow: - support explicitly configured tables or table patterns that can be resolved deterministically - support a stable initial startup position plus continuous incremental capture - integrate with SeaTunnel checkpoint / restore semantics - emit SeaTunnel CDC rows compatible with existing multi-table downstream paths If some Vitess deployment variants require materially different capture behavior, the first delivery should target one well-defined, reproducible path and explicitly defer broader compatibility. ## Suggested implementation approach ### 1. Isolate the Vitess capture backend inside its own module Choose the Vitess change-capture backend first and keep the backend-specific assumptions inside `connector-cdc-vitess`. The connector should avoid forcing unrelated generic CDC code to become Vitess-aware unless a reusable abstraction is clearly justified. ### 2. Follow existing CDC connector module layout The module should include at least: - connector-owned source options - source config / config factory - Vitess-specific source adapter or dialect layer - offset representation / offset factory - startup behavior integration - docs and plugin metadata registration Expected repository touch points include: - `seatunnel-connectors-v2/connector-cdc/connector-cdc-vitess` - `seatunnel-connectors-v2/connector-cdc/pom.xml` - `plugin-mapping.properties` - `seatunnel-dist/pom.xml` - `config/plugin_config` - `docs/en` and `docs/zh` - `seatunnel-e2e` ### 3. Keep startup semantics explicit The connector should expose SeaTunnel-owned startup semantics instead of requiring users to infer behavior through low-level backend properties. A reasonable first delivery is: - one consistent startup path that can initialize from a stable position and continue incrementally - additional startup modes only if they can be implemented and validated cleanly ### 4. Preserve SeaTunnel CDC row contract The connector should emit rows that preserve: - correct table identity - insert/update/delete row-kind semantics - compatibility with existing CDC metadata population where available - compatibility with SeaTunnel multi-table sink flows ### 5. Checkpoint / restore correctness is mandatory The connector should not be considered complete if it only starts once but cannot resume correctly. The implementation must verify that: - offsets are serializable - restore resumes from a stable Vitess CDC position - restart does not silently lose change events ### 6. Tests and validation Suggested test layers: - option parsing / validation tests - offset serialization / restore tests - source behavior tests for startup + incremental flow - at least one runnable integration or E2E validation path If CI cannot host a full Vitess environment easily, the chosen validation strategy should be documented explicitly instead of left implicit. ## Suggested acceptance criteria - A new `connector-cdc-vitess` module is added. - The connector can consume change events for explicitly configured tables using a stable startup position. - The connector integrates with SeaTunnel checkpoint / restore behavior correctly. - The connector emits rows compatible with SeaTunnel's existing multi-table CDC runtime. - English and Chinese docs are added. - Plugin registration and distribution packaging are updated. - At least one focused integration/E2E validation path is provided. ## Non-goals - Vitess sink connector work. - Dynamic newly-added table discovery. - New global CDC metadata fields. - Broad CDC framework refactors not required for Vitess support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
