amogh-jahagirdar commented on code in PR #5373:
URL: https://github.com/apache/iceberg/pull/5373#discussion_r932574242
##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/actions/BaseDeleteOrphanFilesSparkAction.java:
##########
@@ -182,12 +192,26 @@ private DeleteOrphanFiles.Result doExecute() {
List<String> orphanFiles =
actualFileDF.join(validFileDF, joinCond,
"leftanti").as(Encoders.STRING()).collectAsList();
- Tasks.foreach(orphanFiles)
- .noRetry()
- .executeWith(deleteExecutorService)
- .suppressFailureWhenFinished()
- .onFailure((file, exc) -> LOG.warn("Failed to delete file: {}", file,
exc))
- .run(deleteFunc::accept);
+ if (batchDeletionSize > 1) {
+ Preconditions.checkArgument(
+ table.io() instanceof SupportsBulkOperations,
+ "FileIO %s does not support bulk deletion",
+ table.io().getClass().getName());
+ SupportsBulkOperations bulkFileIO = (SupportsBulkOperations) table.io();
+ List<List<String>> fileBatches = Lists.partition(orphanFiles,
batchDeletionSize);
+ Tasks.foreach(fileBatches)
+ .noRetry()
+ .executeWith(deleteExecutorService)
+ .suppressFailureWhenFinished()
+ .run(bulkFileIO::deleteFiles);
+ } else {
+ Tasks.foreach(orphanFiles)
+ .noRetry()
+ .executeWith(deleteExecutorService)
+ .suppressFailureWhenFinished()
+ .onFailure((file, exc) -> LOG.warn("Failed to delete file: {}",
file, exc))
+ .run(deleteFunc::accept);
+ }
Review Comment:
This can probably be abstracted away into a single method which delegates to
the right approach but I didn't want to introduce more indirection or expose
public methods unnecessarily until we know for sure we want them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]