mghosh4 opened a new issue #10378:
URL: https://github.com/apache/druid/issues/10378


   ### Motivation
   
   The primary motivation of this work is to provide more visibility into the 
worker utilization over time. Monitoring utilization can help cluster 
administrators determine when to add/remove workers from the pool. With native 
ingestion adoption, this has become even more important. 
   
   ### Proposed changes
   
   I propose to add a new `WorkerCountStatsMonitor` in Overlord similar to 
`TaskCountStatsMonitor` class. It will expose the following metrics:
   
   - `worker/total/count`: Total number of workers 
   - `worker/idle/count`: Total number of workers available for adding tasks 
   - `worker/used/count`: Total number of workers being currently used
   - `worker/lazy/count`: Total number of workers that have been marked lazy 
   - `worker/blacklisted/count`: Total number of workers that have been 
blacklisted
   
   ### Proposed Design
   
   I am planning to add the following apis in `TaskRunner`
   
   ```
     /**
      * APIs useful for emitting statistics for @WorkerCountStatsMonitor
     */
     long getTotalWorkerCount();
   
     long getIdleWorkerCount();
   
     long getUsedWorkerCount();
   
     long getLazyWorkerCount();
   
     long getBlacklistedWorkerCount();
   ```
   
   The implementation for `WorkerCountStatsMonitor` will be similar to 
`TaskCountStatsMonitor`. It will use the `WorkerCountStatsProvider` interface 
which will be implemented by `TaskMaster`. `TaskMaster` will use `taskRunner` 
to emit the required metrics
   
   ### Operational impact
   
   As such this change does not have any operational impact. We are adding some 
new metrics for better cluster monitoring.
   
   ### Test plan 
   
   Other than unit tests, we are also planning to test this in our local Druid 
clusters. We will be using the statsd-based emitter framework to collect all 
the new emitted metrics for visualization.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to