yashmayya opened a new pull request, #14847:
URL: https://github.com/apache/pinot/pull/14847

   - https://github.com/apache/pinot/pull/14574 introduced a mechanism to 
throttle multi-stage engine queries at the broker level using a new beta 
cluster config. The mechanism treated all queries as equivalent and assumed 
that queries were evenly distributed across brokers.
   - This patch updates the mechanism to take into account the estimated number 
of threads that would be spawned in the servers for a query instead. The beta 
cluster config is also changed to reflect this. This is a better model since 
users no longer need to tweak the config based on their exact query workload 
and can instead use an estimated value based on the cluster size and instance 
sizes. Furthermore, mixed workloads will be better supported with larger 
queries potentially being blocked for longer while smaller queries are executed 
(query starvation and timeout is a possibility with this primitive model 
though).
   - Note that prior to this change, there was actually an issue with the way 
the throttling threshold was determined - each broker calculated the threshold 
as `maxConcurrentQueries * numServers / numBrokers`. Since the max concurrent 
queries is a "per server" config, the calculation incorrectly makes the 
assumption that queries are executed on a single server (instead of assuming 
that they're dispatched to all servers which is a better assumption).
   - With the change here to throttle based on the estimated number of threads, 
the throttling threshold becomes `maxServerQueryThreads * numServers / 
numBrokers` which only makes the assumption that queries are evenly distributed 
across brokers (and no assumptions about the fanout to servers). This feature 
isn't intended to completely prevent large queries from executing so the 
cluster config should be set to a value that is at least large enough to 
accommodate queries that can spawn up to `maxServerQueryThreads * numServers / 
numBrokers` number of threads.
   - An alternate implementation could be to track the estimated number of 
threads on a per server basis (using the worker <-> server mapping in the query 
plan) but this has a lot more edge cases and also much higher overhead on large 
clusters with a large number of servers (since we'd need to acquire permits for 
many servers for every single query).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to