yashmayya opened a new pull request, #14574:
URL: https://github.com/apache/pinot/pull/14574

   - Currently, there is no limit on the number of multi-stage queries that are 
executed concurrently. Every request that comes in to a broker is immediately 
compiled into a query plan and dispatched to the servers for query execution.
   - On the servers, each stage in the query plan is submitted to a cached 
thread pool executor. This means that we don't have an effective way to limit 
the number of threads that are being utilized to execute multi-stage queries on 
servers. This can result in a very large number of multi-stage query executor 
threads in high QPS / complex query workloads leading to resource contention 
and performance degradation.
   - Simply using a fixed thread pool executor instead is not a good solution 
because this could lead to distributed deadlocks and query timeouts / failures. 
Similarly, it isn't trivial to implement the max concurrent query execution 
logic in the servers without considering global ordering.
   - Consider this example from @gortiz - 
     - Given a query `Q1` is formed by stages `S11` and `S12`, where `S11` 
depends on data from `S12`.
     - And a query `Q2` is formed by stages `S21` and `S22`, where `S21` 
depends on data from `S22`
     - If `Q1` and `Q2` arrive at the same time and:
       - Server `Srv1` receives `Q1` first, starts executing `S11` and ends up 
waiting for server `Srv2` to execute `S12`
       - Server `Srv2` receives `Q2` first, starts executing `S21` and ends up 
waiting for server `Srv1` to execute `S22`
       - Server `Srv1` receives `Q2` and would execute `S22`, but sees that it 
has reached max number of concurrent queries, so it waits.
       - Server Srv2 receives `Q1` and would execute `S12`, but sees that it 
has reached max number of concurrent queries, so it waits.
       - This leads to a situation where both the queries are blocked and will 
eventually timeout and fail.
   - This patch takes an alternative approach of limiting the number of 
concurrently executing multi-stage queries in the broker itself.
   - A new cluster level config 
`pinot.multistage.engine.max.concurrent.queries` is introduced. This is 
disabled by default and is currently an opt-in throttling mechanism.
   - If set to a value > 0, the value will be divided among all the brokers - 
i.e., if `pinot.multistage.engine.max.concurrent.queries` is set to `12` and 
there are 3 brokers in the cluster, each broker will be able to execute 4 
multi-stage queries concurrently. If there are already 4 queries currently 
executing, any subsequent incoming queries will wait for a slot to open up.
   - This is implemented using a custom semaphore which reacts to changes in 
the cluster config value or the number of brokers in the cluster in order to 
recalculate the total available permits. A custom semaphore implementation is 
needed here because we want to use the 
[Semaphore::reducePermits](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html#reducePermits-int-)
 method (when # of brokers increases or the cluster config value for max 
concurrent queries decreases) which has `protected` access. We need to use this 
method in order to be able to reduce the total number of permits for the 
semaphore without blocking.
   - The name of the cluster config doesn't indicate that it's a broker level 
mechanism and hence can be re-used in the future if we move this logic to the 
servers. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to