JRobTS opened a new issue, #18642: URL: https://github.com/apache/druid/issues/18642
### Description We started testing with hilo query laning but had to abandon the strategy because multiple requests would end up getting rejected with 429. This feature would be invaluable to us if excess queries were queued instead of outright rejected. Something similar to https://github.com/apache/druid/pull/15440 but with support for lanes. ### Motivation In testing, just two or three really heavy queries can too easily starve out all other queries on the cluster; where instead of the usual 0.2 sec response time, users get response times exceeding 10 seconds. Query laning (hilo) is the perfect solution because it allows those really heavy queries through while minimizing the impact of the more typical queries. The major drawback of this, however, is that users receive errors due to HTTP 429 response. A typical use case where this becomes a problem is running a daily digest job with multiple heavy queries that run simultaneously. Tools like Looker & Grafana don't easily allow for running these queries in sequence and even if they did you would want to run queries in parallel most of the time and running in sequence is just an edge case. It can also be difficult for the client to identify the exact impact a query will have on Druid. Thus, ideally Druid itself can prevent the starvation of smaller queries by reserving some capacity for those smaller queries. One possible solution is to use the existing hilo query laning strategy but instead of rejecting excess queries, queue them, so they execute in sequence with the reserved capacity. One heavy query doesn't impact the cluster too much so we might set: ``` druid.query.scheduler.laning.strategy=hilo druid.query.scheduler.laning.maxLowPercent=1 druid.query.scheduler.prioritization.strategy=threshold druid.query.scheduler.prioritization.segmentCountThreshold=200 ``` Perhaps we could also add a config option to dedicate a portion of the queue to the low lane: ``` druid.query.scheduler.laning.maxLowQueuePercent=50 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
