yuxiqian opened a new pull request, #3801: URL: https://github.com/apache/flink-cdc/pull/3801
This closes FLINK-36763 and FLINK-36690. As explained in #3680, current pipeline design doesn't cooperate well with tables whose data and schema change events are distributed among different partitions, aka. [distributed tables](https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute). Sadly, some data sources (like Kafka) are scatterly-distributed naturally, and could not be easily introduced into current pipeline framework. To resolve this issue while keep backwards compatibility, such changes have been made: 1. Added another suit of `SchemaOperator` and `SchemaCoordinator` for distributed topology. * Previous operators are still in `schema.regular` package while new ones are located in `schema.distributed` package. * Common codes have been escalated into an abstract base class `SchemaRegistry` to reduce duplication. 2. Added a new `@Experimental` optional method into `DataSource` to switch between two topologies. ```java @PublicEvolving public interface DataSource { // ... @Experimental default boolean canContainDistributedTables() { return false; } } ``` Composer will detect data source's distribution trait to determine which operator topology to generate. 3. Extracted schema merging utilities into `SchemaMergingUtils`, and deprecate corresponding functions in `SchemaUtils`. Now, schema merging is required in Transform, Routing, and Schema evolution stages. Sources that support schema inferencing might need it, too. Unifying them in one place would be easier to maintain. 4. Updated migration test cases to cover CDC 3.2.0+ only. CDC 3.1.1 was released over 6 months ago. Keeping state compatibility with earlier versions is not really worthwhile. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
