davidzollo opened a new issue, #11050:
URL: https://github.com/apache/seatunnel/issues/11050

   ## Background
   SeaTunnel currently does not provide a native Vitess CDC source connector in 
the `connector-cdc` family.
   
   That leaves Vitess users without a first-class way to:
   - capture initial table state and incremental changes in one SeaTunnel source
   - participate in SeaTunnel checkpoint / restart semantics with a stable CDC 
position
   - feed multi-table CDC rows into downstream multi-table sinks and schema 
evolution flows
   
   There is historical discussion and even partial prototype work in the 
community, but the old issue is too stale to serve as a practical 
implementation ticket. This issue is intended to replace that with a claimable 
engineering scope.
   
   ## Scope
   Add a new `connector-cdc-vitess` source connector under 
`seatunnel-connectors-v2/connector-cdc`.
   
   This issue is for the **source connector only**.
   
   ## First delivery boundary
   To keep the issue implementable, the first delivery should stay narrow:
   - support explicitly configured tables or table patterns that can be 
resolved deterministically
   - support a stable initial startup position plus continuous incremental 
capture
   - integrate with SeaTunnel checkpoint / restore semantics
   - emit SeaTunnel CDC rows compatible with existing multi-table downstream 
paths
   
   If some Vitess deployment variants require materially different capture 
behavior, the first delivery should target one well-defined, reproducible path 
and explicitly defer broader compatibility.
   
   ## Suggested implementation approach
   ### 1. Isolate the Vitess capture backend inside its own module
   Choose the Vitess change-capture backend first and keep the backend-specific 
assumptions inside `connector-cdc-vitess`.
   
   The connector should avoid forcing unrelated generic CDC code to become 
Vitess-aware unless a reusable abstraction is clearly justified.
   
   ### 2. Follow existing CDC connector module layout
   The module should include at least:
   - connector-owned source options
   - source config / config factory
   - Vitess-specific source adapter or dialect layer
   - offset representation / offset factory
   - startup behavior integration
   - docs and plugin metadata registration
   
   Expected repository touch points include:
   - `seatunnel-connectors-v2/connector-cdc/connector-cdc-vitess`
   - `seatunnel-connectors-v2/connector-cdc/pom.xml`
   - `plugin-mapping.properties`
   - `seatunnel-dist/pom.xml`
   - `config/plugin_config`
   - `docs/en` and `docs/zh`
   - `seatunnel-e2e`
   
   ### 3. Keep startup semantics explicit
   The connector should expose SeaTunnel-owned startup semantics instead of 
requiring users to infer behavior through low-level backend properties.
   
   A reasonable first delivery is:
   - one consistent startup path that can initialize from a stable position and 
continue incrementally
   - additional startup modes only if they can be implemented and validated 
cleanly
   
   ### 4. Preserve SeaTunnel CDC row contract
   The connector should emit rows that preserve:
   - correct table identity
   - insert/update/delete row-kind semantics
   - compatibility with existing CDC metadata population where available
   - compatibility with SeaTunnel multi-table sink flows
   
   ### 5. Checkpoint / restore correctness is mandatory
   The connector should not be considered complete if it only starts once but 
cannot resume correctly.
   
   The implementation must verify that:
   - offsets are serializable
   - restore resumes from a stable Vitess CDC position
   - restart does not silently lose change events
   
   ### 6. Tests and validation
   Suggested test layers:
   - option parsing / validation tests
   - offset serialization / restore tests
   - source behavior tests for startup + incremental flow
   - at least one runnable integration or E2E validation path
   
   If CI cannot host a full Vitess environment easily, the chosen validation 
strategy should be documented explicitly instead of left implicit.
   
   ## Suggested acceptance criteria
   - A new `connector-cdc-vitess` module is added.
   - The connector can consume change events for explicitly configured tables 
using a stable startup position.
   - The connector integrates with SeaTunnel checkpoint / restore behavior 
correctly.
   - The connector emits rows compatible with SeaTunnel's existing multi-table 
CDC runtime.
   - English and Chinese docs are added.
   - Plugin registration and distribution packaging are updated.
   - At least one focused integration/E2E validation path is provided.
   
   ## Non-goals
   - Vitess sink connector work.
   - Dynamic newly-added table discovery.
   - New global CDC metadata fields.
   - Broad CDC framework refactors not required for Vitess support.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to