jasonk000 opened a new pull request #12097:
URL: https://github.com/apache/druid/pull/12097


   ### Description
   
   Improve the performance of `TaskQueue::manage` on large installations by 
removing unnecessary String concat / construction.
   
   Once the changes in #12096 are applied, the performance of the `TaskQueue` 
loop becomes much tighter, and is dominated by the logging calls.
   
   Screenshot of profiler showing `TaskQueue-Mana...` thread:
   
![image](https://user-images.githubusercontent.com/3196528/147293850-fce1bff4-9205-4a45-9f1f-44b09a1612f5.png)
   
   Screenshot of profile flamegraph for this thread, highlighting the `format` 
calls in the stack:
   
![image](https://user-images.githubusercontent.com/3196528/147293905-2ddeeede-4bf8-4619-be89-224c69c08ab4.png)
   
   During the task clean up loop, the `shutdown()` call is issued multiple 
times to the `RemoteTaskRunner` - note especially it uses the three-argument 
invocation of `shutdown()` method:
   
   
https://github.com/apache/druid/blob/476d0bf4be4199e97695bd568d165cda98523d37/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java#L328-L335
   
   This hits the default implementation, which constructs a "reason" argument 
and passes it on to the two-argument `shutdown()`:
   
   
https://github.com/apache/druid/blob/476d0bf4be4199e97695bd568d165cda98523d37/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java#L91-L94
   
   In particular on large clusters with thousands of tasks, this `default void 
shutdown` call performs a `StringUtils.format` on a task set with thousands of 
task IDs, and it serialises all of them to a String.
   
   Even in the event this log line is turned off, the log is still constructed, 
and then discarded.
   
   ### Key changed/added classes in this PR
   
   This introduces an `@Override public void shutdown()` with arguments, and 
has it perform the log construction and issue only if the info level is 
enabled. This results in significantly lower CPU consumption in this loop.
   
   This follows up the changes in #12096, and also follows the mailing list 
discussion here:
   https://lists.apache.org/thread/9jgdwrodwsfcg98so6kzfhdmn95gzyrj
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] been tested in a test Druid cluster (as part of a larger block of 
changes).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to