rdblue commented on a change in pull request #1974:
URL: https://github.com/apache/iceberg/pull/1974#discussion_r549889701



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -169,6 +172,17 @@ public Builder writeParallelism(int newWriteParallelism) {
       return this;
     }
 
+    /**
+     * Configuring the equality field columns for iceberg table that accept 
CDC or UPSERT events.
+     *
+     * @param columns defines the iceberg table's key.
+     * @return {@link Builder} to connect the iceberg table.
+     */
+    public Builder equalityFieldColumns(List<String> columns) {

Review comment:
       For the bloom filter idea, @wangmiao1981 has been working on a proposal 
for secondary indexes. I think that could be used for the check you're 
suggesting here.
   
   > people could choose to use UNIQUENESS ENFORCED or UNIQUENESS NOT-ENFORCED, 
in this way they could trade off between strong semantic and performance.
   
   Are you saying that if uniqueness is enforced, each insert becomes an 
upsert. But if uniqueness is not enforced, then the sink would assume that 
whatever is emitting records will correctly delete before inserting? That 
sounds reasonable to me.
   
   > Finally the size of delete files will be almost same as the size of data 
files. The process of merging on read will be quite inefficient because there 
are too many useless DELETE to JOIN.
   
   I think that even if uniqueness is not enforced, tables will quickly require 
compaction to rewrite the equality deletes. I think we should spend some time 
making sure that we have good ways to maintain tables and compact equality 
deletes into position deletes, and position deletes into data files.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to