chenjunjiedada commented on code in PR #4703:
URL: https://github.com/apache/iceberg/pull/4703#discussion_r877687193
##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -84,4 +84,12 @@ RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace,
Set<DeleteFile> dele
* @return this for method chaining
*/
RewriteFiles validateFromSnapshot(long snapshotId);
+
+ /**
+ * Ignore the position deletes in rewrite validation. Flink upsert job only
generates position deletes in the
+ * ongoing transaction, so it is not necessary to validate position deletes
when rewriting.
+ *
+ * @return this for method chaining
+ */
+ RewriteFiles ignorePosDeletesInValidation();
Review Comment:
Let me simulate use cases, assuming we first have the following upsert
commits.
<pre>
commit-1:
{seq=1, [f[1], f[2], d[1]]}
commit-2:
{seq=1, [f[1], f[2], d[1]]},
{seq=2, [f[3], f[4], d[2]]}
</pre>
With changes in this PR and #4748, we could have rewrite planning and
result as follows (we don't remove deletes right now):
<pre>
plan:
task1 = {f[1], f[2], d[1]},
task2 = {f[3], f[4], d[2]}
commit-3:
{seq=1, [f[5], d[1]},
{seq=2, [f[6], d[2]]
</pre>
Now, if we produce a new commit that contains new pos deletes that target
f[1] and f[2].
<pre>
commit-3:
{seq=1, [f[1], f[2], d[1]]},
{seq=2, [f[3], f[4], d[2]]},
{seq=3, [d[3]]}
</pre>
With the suggested planning strategy, the result is:
<pre>
plan:
task1 = {f[1], f[2], d[3]}
commit-4:
{seq=1, [f[1], f[2], d[1]]},
{seq=2, [f[3], f[4], d[2]]},
{seq=3, [f[5], d[3]]}
</pre>
It looks weird unless the assumption is that the sequence number of a
position delete must be larger than its reference data files. If that is the
case, I would suggest adding another option to DeleteFileIndex and RewriteFiles.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]