I am Dongwoo, and I'm currently working on *GitHub Issue #10203* ( https://github.com/apache/seatunnel/issues/10203).
I am writing to propose a new feature: Dynamic Table Discovery for CDC Connectors. I have been discussing this with project committers on GitHub. Initially, I suggested a MySQL-specific approach, but based on the feedback to make the design more general and extensible for the entire community, I have refined the plan to support all CDC connectors that implement the DataSourceDialect interface. I would like to initiate a formal discussion on this general design. *[Design Proposal] Dynamic Table Discovery for CDC Connectors* 1. Summary This proposal enables CDC connectors to automatically discover and synchronize newly created tables that match configured regex patterns during runtime, without requiring job restarts. The design applies to all CDC connectors implementing the DataSourceDialect interface (MySQL, PostgreSQL, MongoDB, etc.). 2. Design Goals - Generality: Support all compatible CDC connectors by leveraging the existing DataSourceDialect interface. - Backward Compatibility: Disabled by default (discover-new-tables=false). - Configurable: Users can enable/disable discovery and configure the discovery interval. - Minimal Changes: Leverage existing architecture and patterns from other connectors (Kafka, Pulsar). 3. Key Technical Changes - Configuration: Add discover-new-tables and discovery-interval-ms options to BaseSourceConfig. - Enumerator: Implement ScheduledThreadPoolExecutor in IncrementalSourceEnumerator to periodically call dialect.discoverDataCollections(). - SplitAssigner: Add addNewTables() method to SnapshotSplitAssigner and HybridSplitAssigner to process newly discovered tables. 4. Workflow 1. IncrementalSourceEnumerator starts a background thread that periodically discovers new tables. 2. New tables are identified by comparing current metadata with context.getCapturedTables(). 3. New tables are added to SplitAssigner via addNewTables() method. 4. Splits are generated and assigned to readers for processing. I would appreciate your feedback on this design. If the community agrees with this direction, I am ready to start the implementation. Best regards, Dongwoo (@dongwooooooo)
