JRobTS opened a new issue, #18642:
URL: https://github.com/apache/druid/issues/18642

   ### Description
   
   We started testing with hilo query laning but had to abandon the strategy 
because multiple requests would end up getting rejected with 429.
   
   This feature would be invaluable to us if excess queries were queued instead 
of outright rejected.
   
   Something similar to https://github.com/apache/druid/pull/15440 but with 
support for lanes.
   
   
   ### Motivation
   
   In testing, just two or three really heavy queries can too easily starve out 
all other queries on the cluster; where instead of the usual 0.2 sec response 
time, users get response times exceeding 10 seconds.
   
   Query laning (hilo) is the perfect solution because it allows those really 
heavy queries through while minimizing the impact of the more typical queries. 
The major drawback of this, however, is that users receive errors due to HTTP 
429 response.
   
   A typical use case where this becomes a problem is running a daily digest 
job with multiple heavy queries that run simultaneously. Tools like Looker & 
Grafana don't easily allow for running these queries in sequence and even if 
they did you would want to run queries in parallel most of the time and running 
in sequence is just an edge case. It can also be difficult for the client to 
identify the exact impact a query will have on Druid. Thus, ideally Druid 
itself can prevent the starvation of smaller queries by reserving some capacity 
for those smaller queries.
   
   One possible solution is to use the existing hilo query laning strategy but 
instead of rejecting excess queries, queue them, so they execute in sequence 
with the reserved capacity.
   
   One heavy query doesn't impact the cluster too much so we might set:
   ```
   druid.query.scheduler.laning.strategy=hilo
   druid.query.scheduler.laning.maxLowPercent=1
   druid.query.scheduler.prioritization.strategy=threshold
   druid.query.scheduler.prioritization.segmentCountThreshold=200
   ```
   
   Perhaps we could also add a config option to dedicate a portion of the queue 
to the low lane:
   ```
   druid.query.scheduler.laning.maxLowQueuePercent=50 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to