suneet-s commented on code in PR #15770:
URL: https://github.com/apache/druid/pull/15770#discussion_r1470097321
##########
extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java:
##########
@@ -63,6 +69,79 @@ public AzureDataSegmentKiller(
this.azureCloudBlobIterableFactory = azureCloudBlobIterableFactory;
}
+ @Override
+ public void kill(List<DataSegment> segments) throws SegmentLoadingException
+ {
+ if (segments.isEmpty()) {
+ return;
+ }
+ if (segments.size() == 1) {
+ kill(segments.get(0));
+ return;
+ }
+
+ // create a list of keys to delete
+ Map<String, List<String>> containerToKeysToDelete = new HashMap<>();
+ for (DataSegment segment : segments) {
+ Map<String, Object> loadSpec = segment.getLoadSpec();
+ final String containerName = MapUtils.getString(loadSpec,
"containerName");
+ final String blobPath = MapUtils.getString(loadSpec, "blobPath");
+ List<String> keysToDelete = containerToKeysToDelete.computeIfAbsent(
+ containerName,
+ k -> new ArrayList<>()
+ );
+ keysToDelete.add(blobPath);
+ }
+
+ boolean shouldThrowException = false;
+ for (Map.Entry<String, List<String>> containerToKeys :
containerToKeysToDelete.entrySet()) {
+ shouldThrowException = deleteBlobKeys(containerToKeys.getValue(),
containerToKeys.getKey());
+ }
+
+ if (shouldThrowException) {
+ throw new SegmentLoadingException(
+ "Couldn't delete segments from Azure. See the task logs for more
details."
+ );
+ }
+ }
+
+ private Boolean deleteBlobKeys(List<String> keysToDelete, String
containerName)
+ {
+ boolean hadException = false;
+ List<List<String>> keysChunks = Lists.partition(
+ keysToDelete,
+ MAX_MULTI_OBJECT_DELETE_SIZE
Review Comment:
Can you explain why `deleteBlobKeys` is responsible for splitting the llst
of keys into a reasonable size chunk for the bulk delete API? It looks like
there are a couple other functions that call
`azureStorage.batchDeleteFiles(...)` Should we push this behavior into
`azureStorage.batchDeleteFiles`
##########
extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java:
##########
@@ -188,6 +188,12 @@ public void batchDeleteFiles(String containerName,
Iterable<String> paths, Integ
);
}
+ public void batchDeleteFiles(String containerName, Iterable<String> paths)
+ throws BlobBatchStorageException
+ {
+ batchDeleteFiles(containerName, paths, null);
+ }
Review Comment:
No need for this function we can call the existing one with null directly.
It would be good to add javadocs on the function above that `maxAttempts` is
null able and users should call that with null when they want to use the system
configured max retries.
```suggestion
```
##########
extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java:
##########
@@ -63,6 +69,79 @@ public AzureDataSegmentKiller(
this.azureCloudBlobIterableFactory = azureCloudBlobIterableFactory;
}
+ @Override
+ public void kill(List<DataSegment> segments) throws SegmentLoadingException
+ {
+ if (segments.isEmpty()) {
+ return;
+ }
+ if (segments.size() == 1) {
+ kill(segments.get(0));
+ return;
+ }
+
+ // create a list of keys to delete
+ Map<String, List<String>> containerToKeysToDelete = new HashMap<>();
+ for (DataSegment segment : segments) {
+ Map<String, Object> loadSpec = segment.getLoadSpec();
+ final String containerName = MapUtils.getString(loadSpec,
"containerName");
+ final String blobPath = MapUtils.getString(loadSpec, "blobPath");
+ List<String> keysToDelete = containerToKeysToDelete.computeIfAbsent(
+ containerName,
+ k -> new ArrayList<>()
+ );
+ keysToDelete.add(blobPath);
+ }
+
+ boolean shouldThrowException = false;
+ for (Map.Entry<String, List<String>> containerToKeys :
containerToKeysToDelete.entrySet()) {
+ shouldThrowException = deleteBlobKeys(containerToKeys.getValue(),
containerToKeys.getKey());
+ }
+
+ if (shouldThrowException) {
+ throw new SegmentLoadingException(
+ "Couldn't delete segments from Azure. See the task logs for more
details."
+ );
+ }
+ }
+
+ private Boolean deleteBlobKeys(List<String> keysToDelete, String
containerName)
+ {
+ boolean hadException = false;
+ List<List<String>> keysChunks = Lists.partition(
+ keysToDelete,
+ MAX_MULTI_OBJECT_DELETE_SIZE
+ );
+ for (List<String> chunkOfKeys : keysChunks) {
+ try {
+ log.info(
+ "Removing from container: [%s] the following index files: [%s]
from s3!",
Review Comment:
```suggestion
"Removing from container [%s] the following files: [%s]",
```
##########
extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java:
##########
@@ -63,6 +69,79 @@ public AzureDataSegmentKiller(
this.azureCloudBlobIterableFactory = azureCloudBlobIterableFactory;
}
+ @Override
+ public void kill(List<DataSegment> segments) throws SegmentLoadingException
+ {
+ if (segments.isEmpty()) {
+ return;
+ }
+ if (segments.size() == 1) {
+ kill(segments.get(0));
+ return;
+ }
+
+ // create a list of keys to delete
+ Map<String, List<String>> containerToKeysToDelete = new HashMap<>();
+ for (DataSegment segment : segments) {
+ Map<String, Object> loadSpec = segment.getLoadSpec();
+ final String containerName = MapUtils.getString(loadSpec,
"containerName");
+ final String blobPath = MapUtils.getString(loadSpec, "blobPath");
+ List<String> keysToDelete = containerToKeysToDelete.computeIfAbsent(
+ containerName,
+ k -> new ArrayList<>()
+ );
+ keysToDelete.add(blobPath);
+ }
+
+ boolean shouldThrowException = false;
+ for (Map.Entry<String, List<String>> containerToKeys :
containerToKeysToDelete.entrySet()) {
+ shouldThrowException = deleteBlobKeys(containerToKeys.getValue(),
containerToKeys.getKey());
Review Comment:
If shouldThrowException is true once, then it should be true always.
```suggestion
shouldThrowException = shouldThrowException ||
deleteBlobKeys(containerToKeys.getValue(), containerToKeys.getKey());
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]