Re: [DISCUSS] Support Dynamic Table Discovery for CDC Connectors

zhangshenghang Mon, 29 Dec 2025 18:39:00 -0800

tanks  이동우 ， This is a good Feature

Regards,
Jast (Shenghang)



이동우 <[email protected]> 于2025年12月28日周日 23:02写道：

> I am Dongwoo, and I'm currently working on *GitHub Issue #10203* (
> https://github.com/apache/seatunnel/issues/10203).
>
> I am writing to propose a new feature: Dynamic Table Discovery for CDC
> Connectors. I have been discussing this with project committers on GitHub.
> Initially, I suggested a MySQL-specific approach, but based on the feedback
> to make the design more general and extensible for the entire community, I
> have refined the plan to support all CDC connectors that implement the
> DataSourceDialect interface.
>
> I would like to initiate a formal discussion on this general design.
>
>
> *[Design Proposal] Dynamic Table Discovery for CDC Connectors*
> 1. Summary
>
> This proposal enables CDC connectors to automatically discover and
> synchronize newly created tables that match configured regex patterns
> during runtime, without requiring job restarts. The design applies to all
> CDC connectors implementing the DataSourceDialect interface (MySQL,
> PostgreSQL, MongoDB, etc.).
> 2. Design Goals
>
>    -
>
>    Generality: Support all compatible CDC connectors by leveraging the
>    existing DataSourceDialect interface.
>    -
>
>    Backward Compatibility: Disabled by default (discover-new-tables=false).
>    -
>
>    Configurable: Users can enable/disable discovery and configure the
>    discovery interval.
>    -
>
>    Minimal Changes: Leverage existing architecture and patterns from other
>    connectors (Kafka, Pulsar).
>
> 3. Key Technical Changes
>
>    -
>
>    Configuration: Add discover-new-tables and discovery-interval-ms options
>    to BaseSourceConfig.
>    -
>
>    Enumerator: Implement ScheduledThreadPoolExecutor in
>    IncrementalSourceEnumerator to periodically call
>    dialect.discoverDataCollections().
>    -
>
>    SplitAssigner: Add addNewTables() method to SnapshotSplitAssigner and
>    HybridSplitAssigner to process newly discovered tables.
>
> 4. Workflow
>
>    1.
>
>    IncrementalSourceEnumerator starts a background thread that periodically
>    discovers new tables.
>    2.
>
>    New tables are identified by comparing current metadata with
>    context.getCapturedTables().
>    3.
>
>    New tables are added to SplitAssigner via addNewTables() method.
>    4.
>
>    Splits are generated and assigned to readers for processing.
>
> I would appreciate your feedback on this design. If the community agrees
> with this direction, I am ready to start the implementation.
>
>
> Best regards,
>
> Dongwoo (@dongwooooooo)
>

Re: [DISCUSS] Support Dynamic Table Discovery for CDC Connectors

Reply via email to