kfaraz commented on code in PR #14642:
URL: https://github.com/apache/druid/pull/14642#discussion_r1277089074


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -114,23 +119,37 @@ public TaskStatus runTask(TaskToolbox toolbox) throws 
Exception
     }
 
     // List unused segments
-    final List<DataSegment> unusedSegments = toolbox
+    final List<DataSegment> allUnusedSegments = toolbox
         .getTaskActionClient()
         .submit(new RetrieveUnusedSegmentsAction(getDataSource(), 
getInterval()));
 
-    if (!TaskLocks.isLockCoversSegments(taskLockMap, unusedSegments)) {
-      throw new ISE(
-          "Locks[%s] for task[%s] can't cover segments[%s]",
-          
taskLockMap.values().stream().flatMap(List::stream).collect(Collectors.toList()),
-          getId(),
-          unusedSegments
-      );
+    final List<List<DataSegment>> unusedSegmentBatches = 
Lists.partition(allUnusedSegments, SEGMENT_NUKE_BATCH_SIZE);
+
+    // The individual activities here on the toolbox have possibility to run 
for a longer period of time,
+    // since they involve calls to metadata storage and archival object 
storage. And, the tasks take hold of the
+    // task lockbox to run. By splitting the segment list into smaller 
batches, we have an opportunity to yield the
+    // lock to other activity that might need to happen using the overlord 
tasklockbox.
+
+    for (final List<DataSegment> unusedSegments : unusedSegmentBatches) {
+      if (!TaskLocks.isLockCoversSegments(taskLockMap, unusedSegments)) {
+        throw new ISE(
+                "Locks[%s] for task[%s] can't cover segments[%s]",
+                
taskLockMap.values().stream().flatMap(List::stream).collect(Collectors.toList()),
+                getId(),
+                unusedSegments
+        );
+      }
+
+      // Kill segments:
+      // Order is important here: we want the nuke action to clean up the 
metadata records _before_ the
+      // segments are removed from storage, this helps maintain that we will 
always have a storage segment if

Review Comment:
   We still can't be sure if all the segments given to a `SegmentNukeAction` 
were successfully removed from metadata since the action does not return a 
success metric per segment or even a total updated count.



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -60,6 +61,10 @@ public class KillUnusedSegmentsTask extends 
AbstractFixedIntervalTask
 {
   private static final Logger LOG = new Logger(KillUnusedSegmentsTask.class);
 
+  // We split this to try and keep each nuke operation relatively short, in 
the case that either
+  // the database or the storage layer is particularly slow.
+  private static final int SEGMENT_NUKE_BATCH_SIZE = 10_000;

Review Comment:
   Should we do some benchmarking to determine the optimal size here? To me, 
this number seems too large to actually help with the batching.
   
   Unless a lot of segments have been marked as unused, either by a DropRule or 
the user calling an API, I expect the number of unused segments handled by a 
`kill` task to typically be under 1000.
   
   So, are the changes in the PR meant to be only upper limit guardrails? Or a 
perf benefit for normal cases too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to