[GitHub] [iceberg] openinx commented on a change in pull request #2863: Flink: Transform INSERT as one DELETE following one INSERT

GitBox Tue, 27 Jul 2021 05:20:11 -0700


openinx commented on a change in pull request #2863:
URL: https://github.com/apache/iceberg/pull/2863#discussion_r677393295




##########
File path: flink/src/main/java/org/apache/iceberg/flink/IcebergTableSink.java
##########
@@ -63,6 +63,7 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
         .tableLoader(tableLoader)
         .tableSchema(tableSchema)
         .equalityFieldColumns(equalityColumns)
+        .upsert(equalityColumns.size() > 0)

Review comment:
       The upsert mode will produce an extra DELETE for each INSERT operation,  
that means the read scanning job will need to join more extra data which will 
effect the read performance.  If it's the CDC cases, all the events are 
produced as an UPDATE_BEFORE and an UPDATE_AFTER and the stream can gurantee 
the uniqueness when applying each events,  then we don't have to enable the 
upsert switch. 
   
   I will suggest to introduce a table property named `write.upsert.enable`  
for flink table, so that people could choose whether they need the real UPSERT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #2863: Flink: Transform INSERT as one DELETE following one INSERT

Reply via email to