suneet-s commented on PR #14662: URL: https://github.com/apache/druid/pull/14662#issuecomment-1659564965
@zachjsh - First, thank you for making improvements in this area! I have a few comments and I have not yet read the code in detail. I think some additional details in the PR description about how you envision these changes to be used will be helpful in reviewing the patch. > > @zachjsh I really suggest breaking up this PR into smaller PRs where each PR captures a single improvement. (Also when a PR title is `Various Auto kill improvements` it really is telling you nothing much) > > @maytasm , I changed the title to capture the change more succinctly; here we are adding a new kill type task which allows a list of provided segmentIds to be killed, and this new task type is used in the new coordinator duty. @zachjsh I agree with @maytasm. It would be good to split this patch up into multiple PRs to make it easier to review. > Also added a CoordinatorDynamicConfig property `killTaskSlotRatio`, which allows the user to control the maximum number of kill task slots in use in the cluster, as spawned by Druid's auto kill coordinator duty. This considers the total task capacity in the cluster, with autoscaling if configured, and allows for only that respective number of active kill tasks at a given time in the cluster, as spawned by the auto kill duty. ^ This part of the patch seems substantial enough to be it's own PR, and should definitely be called out in the release notes, but as written, it is very easy to miss this. wrt `killTaskSlotRatio`, have you considered combining this with the `compactTaskSlotRatio` so that an operator has one knob to tune how much capacity in the cluster should be given to "data management" operations? Compaction tasks also offer a maxCompactTaskSlot config, if we do decide to have a different set of knobs for kill tasks - is there a reason not to also introduce a maxKillTaskSlots config? Since we are now introducing a config to limit the number of concurrent kill tasks - is there any reason the kill duty should not run more frequently, say every minute by default? Are there any guardrails preventing an operator from setting the task slots used for data management operations higher than the available capacity of the cluster? > Implement new Kill_by_segment_ids task and use in auto kill coordinator duty WRT the `kill_by_segment_id` task - Why does this task need a different name from the kill task? Have you considered making the kill task accept an interval and a limit instead of a list of segment ids? This seems like an easier interface for operators to use and reason about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
