TSFenwick commented on code in PR #14131:
URL: https://github.com/apache/druid/pull/14131#discussion_r1193209315
##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentKiller.java:
##########
@@ -64,13 +73,69 @@ public S3DataSegmentKiller(
this.inputDataConfig = inputDataConfig;
}
+ @Override
+ public void kill(List<DataSegment> segments) throws SegmentLoadingException
+ {
+ int size = segments.size();
+ if (size == 0) {
+ return;
+ }
+ if (segments.size() == 1) {
+ kill(segments.get(0));
+ return;
+
+ }
+
+ // we can assume that all segments are in the same bucket.
+ String s3Bucket = MapUtils.getString(segments.get(0).getLoadSpec(),
BUCKET);
+ final ServerSideEncryptingAmazonS3 s3Client = this.s3ClientSupplier.get();
+
+ List<DeleteObjectsRequest.KeyVersion> keysToDelete = segments.stream()
+ .map(segment -> MapUtils.getString(segment.getLoadSpec(), KEY))
+ .flatMap(path -> Stream.of(new
DeleteObjectsRequest.KeyVersion(path),
+ new
DeleteObjectsRequest.KeyVersion(DataSegmentKiller.descriptorPath(path))))
+ .collect(Collectors.toList());
+
+ // max delete object request size is 1000 for S3
+ List<List<DeleteObjectsRequest.KeyVersion>> keysChunks =
Lists.partition(keysToDelete, 1000);
+ DeleteObjectsRequest deleteObjectsRequest = new
DeleteObjectsRequest(s3Bucket);
+ // only return objects failed to delete.
+ deleteObjectsRequest.setQuiet(true);
+
+ List<String> keysNotDeleted = new ArrayList<>();
+ for (List<DeleteObjectsRequest.KeyVersion> keysChunk : keysChunks) {
+ List<String> keysToDeleteStrings = keysChunk.stream().map(
+
DeleteObjectsRequest.KeyVersion::getKey).collect(Collectors.toList());
+ try {
+ deleteObjectsRequest.setKeys(keysChunk);
+ log.info("Removing from bucket: [%s] the following index files: [%s]
from s3!", s3Bucket, keysToDeleteStrings);
+ s3Client.deleteObjects(deleteObjectsRequest);
+ }
+ catch (MultiObjectDeleteException e)
+ {
+ keysNotDeleted.addAll(e.getErrors().stream()
+ .map(MultiObjectDeleteException.DeleteError::getKey)
+ .collect(Collectors.toList()));
+ }
+ catch (AmazonServiceException e)
+ {
+ throw new SegmentLoadingException(e,
Review Comment:
Thats not how its done in the single kill when i read it.
single kill
```java
throw new SegmentLoadingException(e, "Couldn't kill segment[%s]: [%s]",
segment.getId(), e);
```
What im trying to do here is balance this fine line of making the code
understandable yet also give a reasonable log message. and also to avoid making
unnecessary calls for when there are lots of segments to be deleted. The
multiobject delete also acts differently than the single kill in that the
exceptions are used slightly differently.
a `MultiObjectDeleteException` is a successful delete call where an object
that was requested to be deleted couldn't be deleted for a "valid" reason. for
example if there is a restricted permission on one to all the objects. but the
causes can be numerous. this exception also has a list of all the objects it
couldn't delete.
an AmazonServiceException is the kind of exception you get from a 400-500
status code. This doesn't give a list of all the objects it couldn't delete. if
you get a 401/403 exception here there is no point to make lots of delete calls.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]