[PR] Separate task lifecycle from kubernetes/location lifecycle (druid)

via GitHub Wed, 11 Oct 2023 11:50:34 -0700


georgew5656 opened a new pull request, #15133:
URL: https://github.com/apache/druid/pull/15133


   ### Description
   When running higher volumes of ingestion on the KubernetesTaskRunner 
(especially streaming) there are some issues caused by the difference between 
the Kubernetes lifecycle (pod startup and completion) and the Druid Task 
lifecycle (when a peon JVM has spun up and is ready to serve requests and when 
it has shut down)
   
   - During streaming task startup, in AbstractTask.setup, the 
getChatHandlerProvider gets registered after the UpdateLocation action 
submission. This can cause issues if there is a lot of load on the overlord 
because the task will get stuck retrying these /action submissions even though 
its chat handler has not been registered and the supervisor can't actually 
communicate with the task yet.
   
   - Similarly, the UpdateLocation action during AbstractTask.cleanUp also 
frequently causes issues during streaming task cleanup when there is a lot of 
load on the overlord. The cleanUp method is called after the chat handler 
provider is deregistered, so when the task gets stuck doing cleanup, there is a 
risk of the supervisor trying to chat with the task while it is in the process 
of existing.
   
   
   -On larger Kubernetes clusters, it can take a while for K8s to report that a 
pod has successfully exited, meaning there can be a significant lag between 
when a peon JVM exits and when the KubernetesTaskRunner can report a task as 
completed. In general this slows down how quickly tasks can be reported as 
successful and can also cause similar issues to the above UpdateLocation 
actions with streaming tasks.
   
   
   
   There is a tradeoff between having the peon hit the /action endpoint on the 
overlord with UpdateStatusAction and UpdateLocationAction to give the K8s task 
runner a more accurate account of where the peon is in the task lifecycle vs 
the time/chance of failure that these requests add.
   
   My overall approach was to let the 
KubernetesTaskRunner/KubernetesPeonLifeycle (stuff running on the overlord) 
handle the Kubernetes/TaskLocation lifecycle, but have the peon be directly 
responsible for the task lifecycle by using the UpdateStatusAction as a way to 
mark the task future as complete.
   
   Following this approach, I made two significant changes
   - I removed all the UpdateLocationAction calls in the peon and let the k8s 
task runner handle managing the tasks location itself.  Specifically, the k8s 
task runner now notifies listeners that a task's location has changed when the 
k8s pod gets its ip and when the k8s pod lifecycle completes.
   - I separated the k8s lifecycle logic from the druid task lifecycle logic. 
The exec service in KubernetesTaskRunner is still responsible for submitting a 
K8s job for a task, waiting for the pod to come up, and then deleting the pod 
when it completes, but run(Task) no longer returns this future as the status of 
the Task. Instead, run(Task) returns a settable future that gets completed 
either when the Kubernetes lifecycle completes or when the UpdateStatusAction 
is sent from the peon. This means that tasks will now complete when the peon 
finishes its cleanup and notifies the k8s task runner that it has completed.
   
   I also made a few small cleanup changes
   - registerListener should call notifyStatus on the new listener for running 
tasks.
   - Removed the PeonPhase class since it wasn't serving any purpose.
   
   #### Release note
   - Cleanup lifecycle management of tasks in mm-less task scheduling
   
   ##### Key changed/added classes in this PR
    * `KubernetesTaskRunner`
    * `KubernetesPeonLifecycle`
    * `AbstractTask`
    *  `KubernetesWorkItem`
   
   
   This PR has:
   
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Separate task lifecycle from kubernetes/location lifecycle (druid)

Reply via email to