[GitHub] [iceberg] linyanghao opened a new pull request, #7218: Flink: Use starting sequence number by default when rewriting data files

via GitHub Mon, 27 Mar 2023 03:07:11 -0700


linyanghao opened a new pull request, #7218:
URL: https://github.com/apache/iceberg/pull/7218


   When rewriting data files using Spark, the original starting sequence number 
is assigned to the rewritten files to avoid possible conflicts with newer 
eq-delete files. However, when rewriting data files using Flink, rewritten 
files are currently assigned a new sequence number. Consequently, conflicts 
between rewritten files and new eq-delete files cannot be ignored while 
checking conflicts in 
MergingSnapshotProducer.validateNoNewDeletesForDataFiles(). This creates an 
issue when continuously committing new eq-delete files to a table as we won't 
be able to rewrite files without conflicts.
   
   This PR proposes to solve this problem by enabling Flink to use the starting 
sequence number by default when rewriting data files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] linyanghao opened a new pull request, #7218: Flink: Use starting sequence number by default when rewriting data files

Reply via email to