abhishekrb19 commented on PR #16691: URL: https://github.com/apache/druid/pull/16691#issuecomment-2259455745
@kfaraz, apologies for the delay in getting back. The docs recommend having at least 16 [vCPUs](https://druid.apache.org/docs/latest/tutorials/cluster/#data-server) for data servers, so there will be at least 2 loading threads by default in production clusters. As to how much overlap there is between the time spent by loading threads, I'm not sure. Here are a few exploratory thoughts/ideas to simplify and track this more accurately: 1. How about tracking the load rate directly in the historicals/data servers? I see you have listed that as a potential approach for the future. Besides it being useful and more accurate, I think it's also relatively straightforward to implement. Given that the `SegmentLoadDropHandler` code is already processing batches of change requests, I think we piggyback on the logic to add some tracking there. Also, we don't introduce another notion of "batch" in the coordinator's `HttpLoadQueuePeon` if we decide to revive that idea. I think one downside to this is that the rate computed on the historicals won't account for the end-to-end time (e.g., time spent in the queue, etc). If that is significant, we can perhaps track a metric separately on the coordinator? 2. If we want to compute the aggregated rate from the coordinator, we could perhaps expose an internal API on the historicals that the coordinator can then query to get the requested info (# of loading threads, # of segment batches processed, etc.) if they're available. However, I think this approach might be an overkill. Please let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
