Re: [PR] [FLINK-36794] [cdc-composer/cli] pipeline cdc connector support multiple data sources [flink-cdc]

via GitHub Fri, 10 Jan 2025 02:45:47 -0800


yuxiqian commented on code in PR #3844:
URL: https://github.com/apache/flink-cdc/pull/3844#discussion_r1910194160



##########
docs/content.zh/docs/connectors/pipeline-connectors/mysql.md:
##########
@@ -77,6 +77,32 @@ pipeline:
    parallelism: 4
 ```
 
+## 多数据源示例
+
+单数据源，从多个 MySQL 读取数据同步到 Doris 的 Pipeline 可以定义如下：
+
+```yaml
+source:
+   type: mysql_mutiple

Review Comment:
   I do like @ChaomingZhangCN's proposed syntax for a fully multiple data 
source, they're intuitive and expressive, but might be a chore if users just 
want to connect to a MySQL cluster with multiple servers, as they have to copy 
all identical configurations to both source definition.
   
   @linjianchang's solution for now seems like MySQL specific, especially for 
multi-host clusters. It could not be extended for hetero-sources (like 
concatenating data from different DBMS), or when one wants to use different 
configs for each node. These cases don't exist for now since all we have is 
MySQL source connector, but as we're modifying composer and YAML API (instead 
of MySQL connector itself), such possibility should be discussed more carefully.
   
   As for multiple sources in pipeline itself, I remembered the idea has been 
informally discussed with @leonardBang and @PatrickRen long time ago, and the 
conclusion was running multiple sources in one single job actually makes the 
pipeline more fragile, since any single-point failure would easily escalate and 
cause a global failover. Things might have changed since then, still needs 
hearing from senior developers on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-36794] [cdc-composer/cli] pipeline cdc connector support multiple data sources [flink-cdc]

Reply via email to