openinx commented on a change in pull request #2863:
URL: https://github.com/apache/iceberg/pull/2863#discussion_r677393295
##########
File path: flink/src/main/java/org/apache/iceberg/flink/IcebergTableSink.java
##########
@@ -63,6 +63,7 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context
context) {
.tableLoader(tableLoader)
.tableSchema(tableSchema)
.equalityFieldColumns(equalityColumns)
+ .upsert(equalityColumns.size() > 0)
Review comment:
The upsert mode will produce an extra DELETE for each INSERT operation,
that means the read scanning job will need to join more extra data which will
effect the read performance. If it's the CDC cases, all the events are
produced as an UPDATE_BEFORE and an UPDATE_AFTER and the stream can gurantee
the uniqueness when applying each events, then we don't have to enable the
upsert switch.
I will suggest to introduce a table property named `write.upsert.enable`
for flink table, so that people could choose whether they need the real UPSERT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]