maytasm commented on PR #14662:
URL: https://github.com/apache/druid/pull/14662#issuecomment-1661484210

   @zachjsh  I discussed with @jasonk000 offline and we came up with an 
alternative approach. The new approach should be much simpler, doesn't need a 
new task type and has 2 fold benefit. The approach we discussed is to modify 
the existing KillUnusedSegmentsTask to take in a new argument, `limit`. Auto 
Kill would pass this limit to the KillUnusedSegmentsTask 
(ClientKillUnusedSegmentsTaskQuery would need to be modify to have this new 
argument as well). Inside KillUnusedSegmentsTask, we would do the following:
   1. First, we find the min(batchSize, limit) (Note that both batchSize and 
limit are arguments of KillUnusedSegmentsTask) -- lets called this value 
real_limit
   2. We then RetrieveUnusedSegments for the datasource, interval and 
real_limit (the number of segment retrieved here will be the min(batchSize, 
limit))
   3. Check lock covers segment for the segments retrieved in 2
   4. SegmentNukeAction for the segments retrieved in 2
   5. DataSegmentKiller.kill for the segments retrieved in 2
   6. Reduce limit variable by the number of segments we just killed
   7. Repeat step 1 to 6 until limit is 0
   
   The two benefits here are:
   1) KillUnusedSegmentsTask would only delete up to the `limit` number of 
segments even if the interval given has more segments
   2)This improve existing KillUnusedSegmentsTask without introducing new task 
type. Many users who issued KillUnusedSegmentsTask manually will benefit from 
this. 
   3) (We should do this anyway) We should actually only fetch 
RetrieveUnusedSegmentsAction in batches. Too large number of segments fetch in 
RetrieveUnusedSegmentsAction can result in the task OOM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to