davidzollo opened a new issue, #11049:
URL: https://github.com/apache/seatunnel/issues/11049

   ## Background
   SeaTunnel currently does not provide a native OceanBase CDC source connector 
in the `connector-cdc` family.
   
   That leaves OceanBase users without a first-class path for:
   - initial snapshot + continuous incremental capture
   - checkpoint/restart-safe CDC ingestion
   - multi-table CDC jobs using the same runtime model as other SeaTunnel CDC 
sources
   - downstream schema evolution / multi-table sink integration based on 
SeaTunnel's CDC row model
   
   An old historical discussion exists, but it is stale and no longer gives 
contributors a practical implementation target. This issue is intended to 
replace that with a claimable engineering scope.
   
   ## Scope
   Add a new `connector-cdc-oceanbase` source connector under 
`seatunnel-connectors-v2/connector-cdc`.
   
   This issue is for the **source connector only**.
   
   ## First delivery boundary
   To keep the issue claimable, the first delivery should stay narrow:
   - support snapshot + incremental CDC for explicitly configured tables
   - integrate with SeaTunnel's existing CDC base abstractions where possible
   - support checkpoint / restore correctness
   - support the normal SeaTunnel multi-table CDC row contract
   
   If different OceanBase deployment modes require materially different CDC 
backends, the first delivery should target the path that is stable and testable 
in CI, and explicitly defer additional modes to follow-up work instead of 
blocking the connector.
   
   ## Suggested implementation approach
   ### 1. Choose and isolate the capture backend
   The implementation should start by deciding the CDC capture backend and 
keeping that decision isolated inside the OceanBase connector module.
   
   Practical options may include:
   - an OceanBase-native CDC/log-proxy client path, or
   - a compatible incremental-source path if the target OceanBase deployment 
exposes a stable change-log interface suitable for SeaTunnel's CDC model.
   
   Whichever backend is chosen, the connector should not leak backend-specific 
assumptions into unrelated generic CDC code unless a reusable abstraction is 
clearly justified.
   
   ### 2. Follow the existing CDC connector module layout
   The new module should be structured similarly to existing SeaTunnel CDC 
connectors and include at least:
   - source options
   - source config / source config factory
   - dialect or connector-specific source adapter
   - offset representation / offset factory
   - snapshot split planning if snapshot is supported incrementally
   - fetch task context / incremental reader integration
   - connector docs and plugin metadata registration
   
   Expected repository touch points include:
   - `seatunnel-connectors-v2/connector-cdc/connector-cdc-oceanbase`
   - `seatunnel-connectors-v2/connector-cdc/pom.xml`
   - `plugin-mapping.properties`
   - `seatunnel-dist/pom.xml`
   - `config/plugin_config`
   - `docs/en` and `docs/zh`
   - `seatunnel-e2e`
   
   ### 3. Keep startup semantics explicit
   The connector should expose SeaTunnel-owned startup semantics instead of 
requiring users to infer behavior through low-level passthrough properties.
   
   A reasonable first delivery is:
   - `initial`: read snapshot, then continue with incremental CDC
   - `latest` or equivalent incremental-only startup if the backend supports it 
safely
   
   If additional startup modes are not yet reliable for OceanBase, they should 
be omitted from the first delivery rather than partially implemented.
   
   ### 4. Preserve SeaTunnel CDC row semantics
   The connector should emit rows that fit SeaTunnel's CDC runtime expectations:
   - correct table identity for multi-table jobs
   - row kind semantics aligned with insert/update/delete handling
   - existing metadata population where applicable, such as database/table 
identifiers and CDC timing fields already used by current connectors
   
   ### 5. Checkpoint / restore correctness is mandatory
   The connector should not be considered complete if it only starts 
successfully but cannot resume safely.
   
   Implementation must verify that:
   - offsets/checkpoints are serializable
   - restore resumes from a stable OceanBase CDC position
   - restart does not silently skip or duplicate incremental events beyond 
documented guarantees
   
   ### 6. Tests and validation
   This issue needs more than unit tests.
   
   Suggested test layers:
   - option parsing / validation tests
   - offset serialization / restore tests
   - source behavior tests for snapshot + incremental flow
   - at least one runnable integration or E2E path
   
   If CI cannot host a full OceanBase cluster easily, the issue body or 
implementation notes should document the chosen validation strategy explicitly 
instead of silently skipping end-to-end verification.
   
   ## Suggested acceptance criteria
   - A new `connector-cdc-oceanbase` module is added.
   - The connector can read snapshot data and continue with incremental CDC for 
configured tables.
   - The connector integrates with SeaTunnel checkpoint / restore semantics 
correctly.
   - Multi-table capture is supported for explicitly configured tables.
   - English and Chinese docs are added.
   - Plugin registration and distribution packaging are updated.
   - At least one focused integration/E2E validation path is provided.
   
   ## Non-goals
   - OceanBase sink connector work.
   - Dynamic newly-added table discovery.
   - New global CDC metadata fields.
   - Broad CDC framework refactors not required for OceanBase support.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to