linyanghao opened a new pull request, #7218: URL: https://github.com/apache/iceberg/pull/7218
When rewriting data files using Spark, the original starting sequence number is assigned to the rewritten files to avoid possible conflicts with newer eq-delete files. However, when rewriting data files using Flink, rewritten files are currently assigned a new sequence number. Consequently, conflicts between rewritten files and new eq-delete files cannot be ignored while checking conflicts in MergingSnapshotProducer.validateNoNewDeletesForDataFiles(). This creates an issue when continuously committing new eq-delete files to a table as we won't be able to rewrite files without conflicts. This PR proposes to solve this problem by enabling Flink to use the starting sequence number by default when rewriting data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
