amogh-jahagirdar commented on code in PR #8525:
URL: https://github.com/apache/iceberg/pull/8525#discussion_r1330548802


##########
api/src/main/java/org/apache/iceberg/DeleteFiles.java:
##########
@@ -81,4 +81,8 @@ default DeleteFiles deleteFile(DataFile file) {
    * @return this for method chaining
    */
   DeleteFiles caseSensitive(boolean caseSensitive);
+
+  DeleteFiles validateFilesExist(boolean validateFilesToDeleteExist);
+
+  DeleteFiles validateFromSnapshot(long snapshot);

Review Comment:
   Oh I maybe see what you're getting at, do you mean that it's sufficient to 
simply just check the file existence (just a fileIO existence check) instead of 
leveraging the validateDataFilesExist method? Sounds reasonable. I was 
originally thinking we'd want to go through the manifests, because those would 
be legitimate operations on Iceberg tables. As opposed to someone not going 
through Iceberg and removing files and corrupting their table through some 
other process. But in the end file existence is the only thing we care about in 
terms of validation for the delete to succeed. 
   
   
   I *think* going through the manifests via validateDataFilesExist would be 
more efficient for larger groups of files we would want to validate. We end up 
going through entries in manifests vs multiple HEAD requests for existence. 
Maybe this doesn't matter in practice though and it's better just to do the 
simple existence check because from a validation correctness standpoint it's 
more robust



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to