rdblue commented on code in PR #5096:
URL: https://github.com/apache/iceberg/pull/5096#discussion_r902049477
##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -241,6 +246,52 @@ private List<String> deleteObjectsInBucket(String bucket,
Collection<String> obj
return Lists.newArrayList();
}
+ @Override
+ public Stream<FileInfo> listPrefix(String prefix) {
+ S3URI s3uri = new S3URI(prefix,
awsProperties.s3BucketToAccessPointMapping());
+
+ return internalListPrefix(s3uri.bucket(), s3uri.key()).stream()
+ .flatMap(r -> r.contents().stream())
+ .map(o -> new FileInfo(o.key(), o.size(), o.lastModified()));
+ }
+
+ /**
+ * This method provides a "best-effort" to delete all objects under the
+ * given prefix.
+ *
+ * Bulk delete operations are used and no reattempt is made for deletes if
+ * they fail, but will log any individual objects that are not deleted as
part
+ * of the bulk operation.
+ *
+ * @param prefix prefix to delete
+ */
+ @Override
+ public void deletePrefix(String prefix) {
+ S3URI s3uri = new S3URI(prefix,
awsProperties.s3BucketToAccessPointMapping());
+
+ internalListPrefix(s3uri.bucket(),
s3uri.key()).stream().parallel().forEach(listing -> {
Review Comment:
Rather than using `parallel`, can you use `Tasks` and a thread pool? Tasks
is how we prefer to parallelize operations because it gives you much better
control over error handling:
```java
Tasks.foreach(internalListPrefix(s3uri.bucket(), s3uri.key()).stream())
.suppressFailureWhenFinished()
.executeWith(executorService)
.retry(5)
.exponentialBackoff(10, 60_000, 600_000, 2)
.onlyRetryOn(ThrottledException.class, TooManyRequestsException.class)
.onFailure((task, exc) -> LOG.warn("Failed to delete S3 key: %s",
task.key(), exc))
.run(o -> {
ObjectIdentifier obj = ObjectIdentifier.builder().key(o.key()).build();
...
});
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]