[GitHub] [druid] churromorales opened a new pull request, #13156: Support for middle manager less druid, tasks launch as k8s jobs

GitBox Thu, 29 Sep 2022 14:46:50 -0700


churromorales opened a new pull request, #13156:
URL: https://github.com/apache/druid/pull/13156


   
   ### Description
   
   Add an extension to allow tasks to be run as k8s jobs from the overlord, 
eliminating the need for a middle manager.
   
   The core changes are as follows: 
   1. Refactored arguments to CliPeon to be more generic
   2. Had to add a setup and cleanup method to AbstractTask.  
        a. Because tasks run on separate pods, the task needs to setup its own 
filesystem directories. 
        b. Again because the tasks run on separate pods, we push the task logs 
from the task itself and task reports. in the cleanup method. 
   3. A few other small changes to core required for tasks to run independently 
on their own.
   
   ### How it works
   
   The KubernetesTaskRunner runs in the overlord process.  When it has a 
request to launch a task, it goes to the K8sApi, grabs its own PodSpec (the 
overlord itself).  Takes that podSpec, modifies the necessary attributes (eg: 
command, labels, env variables etc).  Takes the task.json, compresses and 
base64 encodes it.  Then launches a K8s Job.  
   
   The K8s Job on startup, will unwrap the task.json env variable, write it to 
the appropriate directory and run the task. 
   
   The KubernetesTaskRunner monitors the lifecycle of the task, just as the 
ForkingTaskRunner and returns the TaskStatus.
   
   #### What if you are running Sidecars?
   The config option `druid.indexer.runner.sidecarSupport` will support 
launching sidecars, I utilize kubexit (https://github.com/karlkfi/kubexit) to 
setup the spec such that when the main container completes, it terminates the 
sidecars.  This is a known issue with k8s jobs and this is how I work around it.
   
   #### Another nice side-effect
   Because the launching of tasks has been decoupled from the service itself, 
the tasks run independently regardless of the state of the overlord process.  
You can shut down the overlord process, and when it comes back.  It will go to 
the k8s api and get the status of all peon jobs regardless of phase (in flight, 
completed, failed, pending) and will do the proper bookeeping for completed 
tasks and will resume monitoring tasks in flight.
   
   To run a middle manager less druid, simply omit the middle manager from your 
deployment. 
   
    Make sure you also change 
`druid.processing_intermediaryData.storage.type=deepStorage`
   
   In your overlord config: 
        1. Add the `druid-kubernetes-overlord-extensions` to your extensions 
load list.
        2. `druid.indexer.runner.type=k8s`
        3. `druid.indexer.runner.namespace=<currentNamespace>`
        4. `druid.indexer.task.enableTaskLevelLogPush=true` (optional but 
recommended.
   
   This PR has:
   - [ x] been self-reviewed.
      - [x ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ x] added documentation for new or modified features or behaviors.
   - [ x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ x] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests. (this has been added but the k8s integration 
tests only work on a linux machine as they use `conntrack` with minikube.  Thus 
I will have to let travis run and figure things out from there.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] churromorales opened a new pull request, #13156: Support for middle manager less druid, tasks launch as k8s jobs

Reply via email to