I am Dongwoo, and I'm currently working on *GitHub Issue #10203* (
https://github.com/apache/seatunnel/issues/10203).

I am writing to propose a new feature: Dynamic Table Discovery for CDC
Connectors. I have been discussing this with project committers on GitHub.
Initially, I suggested a MySQL-specific approach, but based on the feedback
to make the design more general and extensible for the entire community, I
have refined the plan to support all CDC connectors that implement the
DataSourceDialect interface.

I would like to initiate a formal discussion on this general design.


*[Design Proposal] Dynamic Table Discovery for CDC Connectors*
1. Summary

This proposal enables CDC connectors to automatically discover and
synchronize newly created tables that match configured regex patterns
during runtime, without requiring job restarts. The design applies to all
CDC connectors implementing the DataSourceDialect interface (MySQL,
PostgreSQL, MongoDB, etc.).
2. Design Goals

   -

   Generality: Support all compatible CDC connectors by leveraging the
   existing DataSourceDialect interface.
   -

   Backward Compatibility: Disabled by default (discover-new-tables=false).
   -

   Configurable: Users can enable/disable discovery and configure the
   discovery interval.
   -

   Minimal Changes: Leverage existing architecture and patterns from other
   connectors (Kafka, Pulsar).

3. Key Technical Changes

   -

   Configuration: Add discover-new-tables and discovery-interval-ms options
   to BaseSourceConfig.
   -

   Enumerator: Implement ScheduledThreadPoolExecutor in
   IncrementalSourceEnumerator to periodically call
   dialect.discoverDataCollections().
   -

   SplitAssigner: Add addNewTables() method to SnapshotSplitAssigner and
   HybridSplitAssigner to process newly discovered tables.

4. Workflow

   1.

   IncrementalSourceEnumerator starts a background thread that periodically
   discovers new tables.
   2.

   New tables are identified by comparing current metadata with
   context.getCapturedTables().
   3.

   New tables are added to SplitAssigner via addNewTables() method.
   4.

   Splits are generated and assigned to readers for processing.

I would appreciate your feedback on this design. If the community agrees
with this direction, I am ready to start the implementation.


Best regards,

Dongwoo (@dongwooooooo)

Reply via email to