rdblue commented on a change in pull request #1974:
URL: https://github.com/apache/iceberg/pull/1974#discussion_r549889701
##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -169,6 +172,17 @@ public Builder writeParallelism(int newWriteParallelism) {
return this;
}
+ /**
+ * Configuring the equality field columns for iceberg table that accept
CDC or UPSERT events.
+ *
+ * @param columns defines the iceberg table's key.
+ * @return {@link Builder} to connect the iceberg table.
+ */
+ public Builder equalityFieldColumns(List<String> columns) {
Review comment:
For the bloom filter idea, @wangmiao1981 has been working on a proposal
for secondary indexes. I think that could be used for the check you're
suggesting here.
> people could choose to use UNIQUENESS ENFORCED or UNIQUENESS NOT-ENFORCED,
in this way they could trade off between strong semantic and performance.
Are you saying that if uniqueness is enforced, each insert becomes an
upsert. But if uniqueness is not enforced, then the sink would assume that
whatever is emitting records will correctly delete before inserting? That
sounds reasonable to me.
> Finally the size of delete files will be almost same as the size of data
files. The process of merging on read will be quite inefficient because there
are too many useless DELETE to JOIN.
I think that even if uniqueness is not enforced, tables will quickly require
compaction to rewrite the equality deletes. I think we should spend some time
making sure that we have good ways to maintain tables and compact equality
deletes into position deletes, and position deletes into data files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]