xuyangzhong commented on code in PR #2030: URL: https://github.com/apache/fluss/pull/2030#discussion_r2570397980
########## website/docs/engine-flink/delta-joins.md: ########## @@ -172,11 +177,16 @@ Refer to the [Delta Join Issue](https://issues.apache.org/jira/browse/FLINK-3783 #### Limitations -- The primary key or the prefix lookup key of the tables must be included as part of the equivalence conditions in the join. +- The primary key or the prefix key of the tables must be included as part of the equivalence conditions in the join. - The join must be a INNER join. -- The downstream nodes of the join can accept duplicate changes, such as a sink that provides UPSERT mode. -- When consuming a CDC stream, the join key used in the delta join must be part of the primary key. -- All filters must be applied on the upsert key, and neither filters nor projections should contain non-deterministic functions. +- The downstream node of the join must support idempotent updates, typically it's an upsert sink and should not have a `SinkUpsertMaterializer` node before it. + - Flink planner automatically inserts a `SinkUpsertMaterializer` when the sink’s primary key does not fully cover the upstream update key. + - You can learn more details about `SinkUpsertMaterializer` by reading this [blog](https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-events). +- Since delta join does not support to handle update-before messages, it is necessary to ensure that the entire pipeline can safely discard update-before messages. That means when consuming a CDC stream: Review Comment: In Flink, when consuming a changelog source, the source operator may output `insert`, `update-before`, `update-after`, and `delete` messages (`update-before` and `update-after` originate from an update statement in the storage engine.). Here, I would like to express that the delta join operator cannot handle (consume) the update-before messages output by the source operator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
