maytasm commented on PR #14662: URL: https://github.com/apache/druid/pull/14662#issuecomment-1661484210
@zachjsh I discussed with @jasonk000 offline and we came up with an alternative approach. The new approach should be much simpler, doesn't need a new task type and has 2 fold benefit. The approach we discussed is to modify the existing KillUnusedSegmentsTask to take in a new argument, `limit`. Auto Kill would pass this limit to the KillUnusedSegmentsTask (ClientKillUnusedSegmentsTaskQuery would need to be modify to have this new argument as well). Inside KillUnusedSegmentsTask, we would do the following: 1. First, we find the min(batchSize, limit) (Note that both batchSize and limit are arguments of KillUnusedSegmentsTask) -- lets called this value real_limit 2. We then RetrieveUnusedSegments for the datasource, interval and real_limit (the number of segment retrieved here will be the min(batchSize, limit)) 3. Check lock covers segment for the segments retrieved in 2 4. SegmentNukeAction for the segments retrieved in 2 5. DataSegmentKiller.kill for the segments retrieved in 2 6. Reduce limit variable by the number of segments we just killed 7. Repeat step 1 to 6 until limit is 0 The two benefits here are: 1) KillUnusedSegmentsTask would only delete up to the `limit` number of segments even if the interval given has more segments 2)This improve existing KillUnusedSegmentsTask without introducing new task type. Many users who issued KillUnusedSegmentsTask manually will benefit from this. 3) (We should do this anyway) We should actually only fetch RetrieveUnusedSegmentsAction in batches. Too large number of segments fetch in RetrieveUnusedSegmentsAction can result in the task OOM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
