jasonk000 opened a new pull request #12096: URL: https://github.com/apache/druid/pull/12096
### Description Improve the performance of `RemoteTaskRunner::tryAssignTask` which consumes long periods of CPU on the Overlord during a task restart operation. Screenshot of profiler showing long period of `rtr-pending-..` task thread.  Screenshot of profile flamegraph for this thread, showing 100pc of CPU in `tryAssignTask` loop:  ##### Key changed/added classes in this PR This change: 1. eliminates triple nested call of `getRunningTasks()` in `ZkWorker::toImmutable`, and, 2. reduces the work performed in `ZkWorker::isRunningTask` by parsing only the `id` field instead of the entire ZkWorker json. By eliminating this extra work, the loop is much tighter. This is a change coupled to this mailing thread discussion: https://lists.apache.org/thread/9jgdwrodwsfcg98so6kzfhdmn95gzyrj ##### Tests Tests in `RemoteTaskRunner*Test.java` capture this functionality. <hr> This PR has: - [x] been self-reviewed. - [x] been tested in a test Druid cluster (as a part of a larger block of changes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
