nishantmonu51 opened a new issue #8801: KubernetesTaskRunner for running druid 
tasks as kubernetes jobs
URL: https://github.com/apache/incubator-druid/issues/8801
 
 
   ### Motivation
   Below talk by @Jinchul81 outlines a way to autoscale druid Middlemanagers by 
emitting druid metrics - 
   
https://www.slideshare.net/Hadoop_Summit/apache-druid-auto-scaleoutin-for-streaming-data-ingestion-on-kubernetes
   
   However, there are some limitations with this approach especially for 
selecting MMs for scaling down discussed towards end slides.
   * A middlemanager can only be scaled down when all its tasks have been 
completed. 
   
   Another requirement with MMs is over-provisioning where `required 
workerCapacity = 2 * replicas * taskCount`  to accomodate one set of tasks 
publishing while another set is reading 
   
   ### Proposed changes
   
   This proposal is to implement a KubernetesTaskRunner as part of a new 
extension druid-kubernetes
   
   * K8sTaskRunner will use K8s API to directly submit jobs to kubernetes 
cluster using java kubernetes-client https://github.com/kubernetes-client/java
   * Each K8s Job will only run Peon process.
   * Kubernetes task runner will use Watch 
([link](https://github.com/kubernetes-client/java/blob/master/examples/src/main/java/io/kubernetes/client/examples/WatchExample.java))
 to watch for the status changes of the submitted jobs
   * Task logs can will also be streamed in the console using existing 
kubernetes APIs. 
([link](https://github.com/kubernetes-client/java/blob/master/examples/src/main/java/io/kubernetes/client/examples/LogsExample.java))
   * For replica tasks, kubernetes antiaffinity can be used to make sure that 
replica tasks are assigned to different instances 
   * Once the job completes, allocated resources will be freed and can be 
assigned to other tasks. 
   * Instead of MM pushing the task logs on completion to log storage, Overlord 
will fetch the logs and push them to the log storage
   
   ### Rationale
   * KubernetesTaskRunner would help in better resource utilization in the 
cloud as we need not allocate larger pods which can host multiple tasks that 
are mostly under-utilized
   * Kubernetes 
[ClusterAutoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
 can be used to add more nodes whenever needed 
   ** If enough resources are not available, tasks will remain in pending 
state, cluster-autoscaler will create more nodes for running these pods
   * No over-provisioning is needed to allow simultaneously publishing and 
reading tasks is required
   
   ### Operational impact
   
   * Simplified autoscaling in cloud. 
   * No need to manage extra configuration for MiddleManagers in the cloud
   * This is proposed to be done as a kubernetes extension with the hope of 
adding better cloud support for druid. It will be an Optional opt-in feature 
and no change in core druid are required.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to