[PR] Move some lifecycle management from doTask -> shutdown for the mm-less task runner (druid)

via GitHub Tue, 22 Aug 2023 10:39:21 -0700


georgew5656 opened a new pull request, #14895:
URL: https://github.com/apache/druid/pull/14895


   ### Description
   The mm-less task runner differs in behavior from the other runners that run 
on the overlord because it tries to handle the cleanup lifecycle on its own by 
immediately deleting k8s jobs and clearing it's tasks map as soon as it has 
finished running a task.
   
   The HttpRemoteTaskRunner and RemoteTaskRunner don't do this. Instead, they 
rely on the TaskQueue's handlers for the futures that all the task runners 
return to call shutdown on the task once it has completed.
   
   Updating the mm-less task runner to use logic more similar to the other task 
runners has a couple benefits.
   - The task location (including the k8sPodName) is successfully persisted to 
taskStorage in TaskQueue.notifyStatus. Currently the mm-less task runner 
reports no location  in this function call because its run lifecycle will have 
already cleaned up the K8s Job and its tasks map.
   -  Currently, when a task completes, the taskQueue handler will try to call 
shutdown on the k8s task runner after the runner has already shut down the 
task. this creates a bunch of "Ignoring request to cancel unknown task" logs 
and in general seems likely to cause unexpected behavior in the future. 
Changing the logic will remove this duplication.
   
   **Changes**
   - In KubernetesPeonLifecycle, stop calling shutdown (to delete the K8s job) 
after a job has finished.
   - In KubernetesTaskRunner.doTask, stop removing the taskId from tasks in the 
logic of the run future.
   - In KubernetesTaskRunner.shutdown, remove taskId from tasks in addition to 
calling shutdown on the job. When taskQueue calls this shutdown function, 
everything in the task will be cleaned up as expected.
   - Remove the shutdownRequested flag from KubernetesWorkItem since we can now 
treat the presence (or lack of presence) of the taskId in the tasks map as a 
indicator of whether the task was shutdown.
   
   
   #### Release note
   Update mm-less task runner lifecycle logic to better match the logic in the 
HTTP and Zookeeper worker task runners.
   
   ##### Key changed/added classes in this PR
    * `KubernetesPeonLifecycle`
    * `KubernetesTaskRunner`
    * `KubernetesWorkItem`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist below are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Move some lifecycle management from doTask -> shutdown for the mm-less task runner (druid)

Reply via email to