nishantmonu51 opened a new issue #8801: KubernetesTaskRunner for running druid tasks as kubernetes jobs URL: https://github.com/apache/incubator-druid/issues/8801 ### Motivation Below talk by @Jinchul81 outlines a way to autoscale druid Middlemanagers by emitting druid metrics - https://www.slideshare.net/Hadoop_Summit/apache-druid-auto-scaleoutin-for-streaming-data-ingestion-on-kubernetes However, there are some limitations with this approach especially for selecting MMs for scaling down discussed towards end slides. * A middlemanager can only be scaled down when all its tasks have been completed. Another requirement with MMs is over-provisioning where `required workerCapacity = 2 * replicas * taskCount` to accomodate one set of tasks publishing while another set is reading ### Proposed changes This proposal is to implement a KubernetesTaskRunner as part of a new extension druid-kubernetes * K8sTaskRunner will use K8s API to directly submit jobs to kubernetes cluster using java kubernetes-client https://github.com/kubernetes-client/java * Each K8s Job will only run Peon process. * Kubernetes task runner will use Watch ([link](https://github.com/kubernetes-client/java/blob/master/examples/src/main/java/io/kubernetes/client/examples/WatchExample.java)) to watch for the status changes of the submitted jobs * Task logs can will also be streamed in the console using existing kubernetes APIs. ([link](https://github.com/kubernetes-client/java/blob/master/examples/src/main/java/io/kubernetes/client/examples/LogsExample.java)) * For replica tasks, kubernetes antiaffinity can be used to make sure that replica tasks are assigned to different instances * Once the job completes, allocated resources will be freed and can be assigned to other tasks. * Instead of MM pushing the task logs on completion to log storage, Overlord will fetch the logs and push them to the log storage ### Rationale * KubernetesTaskRunner would help in better resource utilization in the cloud as we need not allocate larger pods which can host multiple tasks that are mostly under-utilized * Kubernetes [ClusterAutoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) can be used to add more nodes whenever needed ** If enough resources are not available, tasks will remain in pending state, cluster-autoscaler will create more nodes for running these pods * No over-provisioning is needed to allow simultaneously publishing and reading tasks is required ### Operational impact * Simplified autoscaling in cloud. * No need to manage extra configuration for MiddleManagers in the cloud * This is proposed to be done as a kubernetes extension with the hope of adding better cloud support for druid. It will be an Optional opt-in feature and no change in core druid are required.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
