amogh-jahagirdar commented on code in PR #8525:
URL: https://github.com/apache/iceberg/pull/8525#discussion_r1330512680
##########
api/src/main/java/org/apache/iceberg/DeleteFiles.java:
##########
@@ -81,4 +81,8 @@ default DeleteFiles deleteFile(DataFile file) {
* @return this for method chaining
*/
DeleteFiles caseSensitive(boolean caseSensitive);
+
+ DeleteFiles validateFilesExist(boolean validateFilesToDeleteExist);
+
+ DeleteFiles validateFromSnapshot(long snapshot);
Review Comment:
The implementation currently leverages
MergingSnapshotProducer#validateDataFilesExist to determine that the files
exist and weren't removed concurrently. That requires specifying a
`startingSnapshotId` as the starting point for determining the relevant history
to look at when doing the conflict detection logic (in this case checking if
there are any new manifests since startingSnapshotId which have entries
indicating that the file was already removed).
Callers of the DeleteFiles API are not required to provide the
`startingSnapshotId`. By default in the implementation the
`validateFromSnapshot` is null but this means that during the conflict
validation check, it'll go through a longer history (from the first snapshot in
the current metadata) and read manifests which maybe would not need to be read
anyways. So exposing this option gives callers a chance to check a narrower
range if they know a good starting point to validate from and reduce the time
taken in conflict detection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]