abhishekagarwal87 commented on PR #18148: URL: https://github.com/apache/druid/pull/18148#issuecomment-2979064473
I think, this solves the problem only a narrow class of problems and adds another parameter that may not see a broader adoption. I do however agree that query scheduling at data level is an area worth exploring and tinkering with. Though I wonder if there are better ways to solve this problem. One solution that comes to my mind is if we can use virtual threads of sort. Right now, we have this processing thread pool that essentially dictates the compute capacity that segment processing threads. But if these threads are doing lot more IO than CPU, that capacity is being wasted. Recently java has gotten the ability of Virtual threads and that could be used to run segment processing instead of directly using OS-level threads. A higher-level comment is that we shouldn't just make this change without some confidence that our solution makes lives better for a good number of use cases. You should first build a test setup that can be used to simulate query congestion at data level along with metrics that reflect the degree of the congestion, throughput, fairness. Once such a system is in place, thats when you can craft few strategies and using your test setup to measure what strategy is the best. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
