georgew5656 opened a new pull request, #14895:
URL: https://github.com/apache/druid/pull/14895
### Description
The mm-less task runner differs in behavior from the other runners that run
on the overlord because it tries to handle the cleanup lifecycle on its own by
immediately deleting k8s jobs and clearing it's tasks map as soon as it has
finished running a task.
The HttpRemoteTaskRunner and RemoteTaskRunner don't do this. Instead, they
rely on the TaskQueue's handlers for the futures that all the task runners
return to call shutdown on the task once it has completed.
Updating the mm-less task runner to use logic more similar to the other task
runners has a couple benefits.
- The task location (including the k8sPodName) is successfully persisted to
taskStorage in TaskQueue.notifyStatus. Currently the mm-less task runner
reports no location in this function call because its run lifecycle will have
already cleaned up the K8s Job and its tasks map.
- Currently, when a task completes, the taskQueue handler will try to call
shutdown on the k8s task runner after the runner has already shut down the
task. this creates a bunch of "Ignoring request to cancel unknown task" logs
and in general seems likely to cause unexpected behavior in the future.
Changing the logic will remove this duplication.
**Changes**
- In KubernetesPeonLifecycle, stop calling shutdown (to delete the K8s job)
after a job has finished.
- In KubernetesTaskRunner.doTask, stop removing the taskId from tasks in the
logic of the run future.
- In KubernetesTaskRunner.shutdown, remove taskId from tasks in addition to
calling shutdown on the job. When taskQueue calls this shutdown function,
everything in the task will be cleaned up as expected.
- Remove the shutdownRequested flag from KubernetesWorkItem since we can now
treat the presence (or lack of presence) of the taskId in the tasks map as a
indicator of whether the task was shutdown.
#### Release note
Update mm-less task runner lifecycle logic to better match the logic in the
HTTP and Zookeeper worker task runners.
##### Key changed/added classes in this PR
* `KubernetesPeonLifecycle`
* `KubernetesTaskRunner`
* `KubernetesWorkItem`
<hr>
<!-- Check the items by putting "x" in the brackets for the done things. Not
all of these items apply to every PR. Remove the items which are not done or
not relevant to the PR. None of the items from the checklist below are strictly
necessary, but it would be very helpful if you at least self-review the PR. -->
This PR has:
- [X] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [X] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]