danny0405 commented on code in PR #14320: URL: https://github.com/apache/hudi/pull/14320#discussion_r2553803399
########## website/docs/ingestion_flink.md: ########## @@ -1,179 +1,361 @@ --- title: Using Flink keywords: [hudi, flink, streamer, ingestion] +last_modified_at: 2025-11-22T12:53:57+08:00 --- -### CDC Ingestion -CDC(change data capture) keep track of the data changes evolving in a source system so a downstream process or system can action that change. +## CDC Ingestion + +CDC (change data capture) keeps track of data changes evolving in a source system so a downstream process or system can act on those changes. We recommend two ways for syncing CDC data into Hudi:  -1. Using the Ververica [flink-cdc-connectors](https://github.com/ververica/flink-cdc-connectors) directly connect to DB Server to sync the binlog data into Hudi. - The advantage is that it does not rely on message queues, but the disadvantage is that it puts pressure on the db server; -2. Consume data from a message queue (for e.g, the Kafka) using the flink cdc format, the advantage is that it is highly scalable, +1. Use the Ververica [flink-cdc-connectors](https://github.com/ververica/flink-cdc-connectors) to directly connect to the database server and sync binlog data into Hudi. + The advantage is that it does not rely on message queues, but the disadvantage is that it puts pressure on the database server. +2. Consume data from a message queue (e.g., Kafka) using the Flink CDC format. The advantage is that it is highly scalable, but the disadvantage is that it relies on message queues. :::note -- If the upstream data cannot guarantee the order, you need to specify option `write.precombine.field` explicitly; +If the upstream data cannot guarantee ordering, you need to explicitly specify the `write.precombine.field` option. Review Comment: write.precombine.field -> ordering.fields -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
