amogh-jahagirdar commented on code in PR #8525:
URL: https://github.com/apache/iceberg/pull/8525#discussion_r1330548802
##########
api/src/main/java/org/apache/iceberg/DeleteFiles.java:
##########
@@ -81,4 +81,8 @@ default DeleteFiles deleteFile(DataFile file) {
* @return this for method chaining
*/
DeleteFiles caseSensitive(boolean caseSensitive);
+
+ DeleteFiles validateFilesExist(boolean validateFilesToDeleteExist);
+
+ DeleteFiles validateFromSnapshot(long snapshot);
Review Comment:
Oh I maybe see what you're getting at, do you mean that it's sufficient to
simply just check the file existence (just a fileIO existence check) instead of
leveraging the validateDataFilesExist method? Sounds reasonable. I was
originally thinking we'd want to go through the manifests, because those would
be legitimate removal operations on Iceberg tables. As opposed to someone not
going through Iceberg and removing files and corrupting their table through
some other process which wouldn't be in the manifests. But in the end file
existence is the only thing we care about in terms of validation for the
deleteFiles to succeed.
I *think* going through the manifests via validateDataFilesExist would be
more efficient for larger groups of files we would want to validate. We end up
going through entries in manifests vs multiple HEAD requests for existence.
Maybe this doesn't matter in practice though and it's better just to do the
simple existence check because from a validation correctness standpoint it's
more robust
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]