amogh-jahagirdar commented on code in PR #5379:
URL: https://github.com/apache/iceberg/pull/5379#discussion_r938272527
##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -168,56 +169,42 @@ public Map<String, String> properties() {
@Override
public void deleteFiles(Iterable<String> paths) throws
BulkDeletionFailureException {
if (awsProperties.s3DeleteTags() != null &&
!awsProperties.s3DeleteTags().isEmpty()) {
+ Set<Tag> deleteTags = awsProperties.s3DeleteTags();
Tasks.foreach(paths)
.noRetry()
.executeWith(executorService())
.suppressFailureWhenFinished()
.onFailure(
- (path, exc) ->
- LOG.warn(
- "Failed to add delete tags: {} to {}",
- awsProperties.s3DeleteTags(),
- path,
- exc))
- .run(path -> tagFileToDelete(path, awsProperties.s3DeleteTags()));
+ (path, exc) -> LOG.warn("Failed to add delete tags: {} to {}",
deleteTags, path, exc))
+ .run(path -> tagFileToDelete(path, deleteTags));
}
- if (!awsProperties.isS3DeleteEnabled()) {
- return;
+ if (awsProperties.isS3DeleteEnabled()) {
+ SetMultimap<String, String> bucketToObjects =
computeBucketToObjects(paths);
Review Comment:
@jackye1995 I don't think we need a separate API for listing actually, we
are already given the iterable of paths. This iterable can be the
ListObjectsV2Iterable (which is done as of today in the deletePrefix).
For lazily loading in memory, and still deleting in a concurrent manner we
can do the following:
1.) Still use the previous approach and constructing the bucket/key from the
path and keeping track of when the objects for a bucket hits a certain batch
size.
2.) Instead of using task framework, submit the deletion to an executor
service and just keep track of the future. we don't want to use tasks because
it will wait for the completion internally
We want to just submit the batch deletion and move on, and then at the very
end check the statuses of all those tasks. So using the underlying executor
service fits that pattern. Let me know what you think.
Will update the PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]