kfaraz commented on code in PR #14642:
URL: https://github.com/apache/druid/pull/14642#discussion_r1278228942


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -60,6 +61,10 @@ public class KillUnusedSegmentsTask extends 
AbstractFixedIntervalTask
 {
   private static final Logger LOG = new Logger(KillUnusedSegmentsTask.class);
 
+  // We split this to try and keep each nuke operation relatively short, in 
the case that either
+  // the database or the storage layer is particularly slow.
+  private static final int SEGMENT_NUKE_BATCH_SIZE = 10_000;

Review Comment:
   > By background we currently have a huge backlog maintenance cleanup (100+ 
million segments), and are arbitrarily limiting kill task Interval size to 
~60-80K segments, which takes ~30-40 seconds of stalling the lockbox
   
   For such cases, I really feel that deleting by interval (rather than 
segmentIds) would help significantly. I don't see how the semantics would be 
any different.
   
   Since you are looking into this, I wonder if you could try evaluating that 
angle as well. Sorry for going in circles, 😅 . I remember we discussed it but I 
forget why we decided to not implement it in this PR. Are there any cases that 
it does not handle?
   
   Also, there is some related work going on in #14662 . We should try to 
ensure that we do not duplicate any work.



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -60,6 +61,10 @@ public class KillUnusedSegmentsTask extends 
AbstractFixedIntervalTask
 {
   private static final Logger LOG = new Logger(KillUnusedSegmentsTask.class);
 
+  // We split this to try and keep each nuke operation relatively short, in 
the case that either
+  // the database or the storage layer is particularly slow.
+  private static final int SEGMENT_NUKE_BATCH_SIZE = 10_000;

Review Comment:
   > By background we currently have a huge backlog maintenance cleanup (100+ 
million segments), and are arbitrarily limiting kill task Interval size to 
~60-80K segments, which takes ~30-40 seconds of stalling the lockbox
   
   For such cases, I really feel that deleting by interval (rather than 
segmentIds) would help significantly. I don't see how the semantics would be 
any different.
   
   Since you are looking into this, I wonder if you could try evaluating that 
angle as well. Sorry for going in circles, 😅 . I remember we discussed it but I 
forget why we decided to not implement it in this PR. Are there any cases that 
it does not handle?
   
   Also, there is some related work going on in #14662 . We should try to 
ensure that we do not duplicate any work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to