jasonk000 opened a new pull request #12096:
URL: https://github.com/apache/druid/pull/12096


   ### Description
   
   Improve the performance of `RemoteTaskRunner::tryAssignTask` which consumes 
long periods of CPU on the Overlord during a task restart operation.
   
   Screenshot of profiler showing long period of `rtr-pending-..` task thread.
   
![image](https://user-images.githubusercontent.com/3196528/147289897-61da95d8-3a0a-4d4c-9b94-b4679316936e.png)
   
   Screenshot of profile flamegraph for this thread, showing 100pc of CPU in 
`tryAssignTask` loop:
   
![image](https://user-images.githubusercontent.com/3196528/147289985-1ee07872-9acb-4a07-81e7-fdf419dac0b2.png)
   
   ##### Key changed/added classes in this PR
   
   This change:
   1. eliminates triple nested call of `getRunningTasks()` in 
`ZkWorker::toImmutable`, and,
   2. reduces the work performed in `ZkWorker::isRunningTask` by parsing only 
the `id` field instead of the entire ZkWorker json.
   
   By eliminating this extra work, the loop is much tighter.
   
   This is a change coupled to this mailing thread discussion:
   https://lists.apache.org/thread/9jgdwrodwsfcg98so6kzfhdmn95gzyrj
   
   
   ##### Tests
   
   Tests in `RemoteTaskRunner*Test.java` capture this functionality.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] been tested in a test Druid cluster (as a part of a larger block of 
changes).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to