georgew5656 opened a new pull request, #15133:
URL: https://github.com/apache/druid/pull/15133
### Description
When running higher volumes of ingestion on the KubernetesTaskRunner
(especially streaming) there are some issues caused by the difference between
the Kubernetes lifecycle (pod startup and completion) and the Druid Task
lifecycle (when a peon JVM has spun up and is ready to serve requests and when
it has shut down)
- During streaming task startup, in AbstractTask.setup, the
getChatHandlerProvider gets registered after the UpdateLocation action
submission. This can cause issues if there is a lot of load on the overlord
because the task will get stuck retrying these /action submissions even though
its chat handler has not been registered and the supervisor can't actually
communicate with the task yet.
- Similarly, the UpdateLocation action during AbstractTask.cleanUp also
frequently causes issues during streaming task cleanup when there is a lot of
load on the overlord. The cleanUp method is called after the chat handler
provider is deregistered, so when the task gets stuck doing cleanup, there is a
risk of the supervisor trying to chat with the task while it is in the process
of existing.
-On larger Kubernetes clusters, it can take a while for K8s to report that a
pod has successfully exited, meaning there can be a significant lag between
when a peon JVM exits and when the KubernetesTaskRunner can report a task as
completed. In general this slows down how quickly tasks can be reported as
successful and can also cause similar issues to the above UpdateLocation
actions with streaming tasks.
There is a tradeoff between having the peon hit the /action endpoint on the
overlord with UpdateStatusAction and UpdateLocationAction to give the K8s task
runner a more accurate account of where the peon is in the task lifecycle vs
the time/chance of failure that these requests add.
My overall approach was to let the
KubernetesTaskRunner/KubernetesPeonLifeycle (stuff running on the overlord)
handle the Kubernetes/TaskLocation lifecycle, but have the peon be directly
responsible for the task lifecycle by using the UpdateStatusAction as a way to
mark the task future as complete.
Following this approach, I made two significant changes
- I removed all the UpdateLocationAction calls in the peon and let the k8s
task runner handle managing the tasks location itself. Specifically, the k8s
task runner now notifies listeners that a task's location has changed when the
k8s pod gets its ip and when the k8s pod lifecycle completes.
- I separated the k8s lifecycle logic from the druid task lifecycle logic.
The exec service in KubernetesTaskRunner is still responsible for submitting a
K8s job for a task, waiting for the pod to come up, and then deleting the pod
when it completes, but run(Task) no longer returns this future as the status of
the Task. Instead, run(Task) returns a settable future that gets completed
either when the Kubernetes lifecycle completes or when the UpdateStatusAction
is sent from the peon. This means that tasks will now complete when the peon
finishes its cleanup and notifies the k8s task runner that it has completed.
I also made a few small cleanup changes
- registerListener should call notifyStatus on the new listener for running
tasks.
- Removed the PeonPhase class since it wasn't serving any purpose.
#### Release note
- Cleanup lifecycle management of tasks in mm-less task scheduling
##### Key changed/added classes in this PR
* `KubernetesTaskRunner`
* `KubernetesPeonLifecycle`
* `AbstractTask`
* `KubernetesWorkItem`
This PR has:
- [X] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [X] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]