yashmayya opened a new pull request, #14574:
URL: https://github.com/apache/pinot/pull/14574
- Currently, there is no limit on the number of multi-stage queries that are
executed concurrently. Every request that comes in to a broker is immediately
compiled into a query plan and dispatched to the servers for query execution.
- On the servers, each stage in the query plan is submitted to a cached
thread pool executor. This means that we don't have an effective way to limit
the number of threads that are being utilized to execute multi-stage queries on
servers. This can result in a very large number of multi-stage query executor
threads in high QPS / complex query workloads leading to resource contention
and performance degradation.
- Simply using a fixed thread pool executor instead is not a good solution
because this could lead to distributed deadlocks and query timeouts / failures.
Similarly, it isn't trivial to implement the max concurrent query execution
logic in the servers without considering global ordering.
- Consider this example from @gortiz -
- Given a query `Q1` is formed by stages `S11` and `S12`, where `S11`
depends on data from `S12`.
- And a query `Q2` is formed by stages `S21` and `S22`, where `S21`
depends on data from `S22`
- If `Q1` and `Q2` arrive at the same time and:
- Server `Srv1` receives `Q1` first, starts executing `S11` and ends up
waiting for server `Srv2` to execute `S12`
- Server `Srv2` receives `Q2` first, starts executing `S21` and ends up
waiting for server `Srv1` to execute `S22`
- Server `Srv1` receives `Q2` and would execute `S22`, but sees that it
has reached max number of concurrent queries, so it waits.
- Server Srv2 receives `Q1` and would execute `S12`, but sees that it
has reached max number of concurrent queries, so it waits.
- This leads to a situation where both the queries are blocked and will
eventually timeout and fail.
- This patch takes an alternative approach of limiting the number of
concurrently executing multi-stage queries in the broker itself.
- A new cluster level config
`pinot.multistage.engine.max.concurrent.queries` is introduced. This is
disabled by default and is currently an opt-in throttling mechanism.
- If set to a value > 0, the value will be divided among all the brokers -
i.e., if `pinot.multistage.engine.max.concurrent.queries` is set to `12` and
there are 3 brokers in the cluster, each broker will be able to execute 4
multi-stage queries concurrently. If there are already 4 queries currently
executing, any subsequent incoming queries will wait for a slot to open up.
- This is implemented using a custom semaphore which reacts to changes in
the cluster config value or the number of brokers in the cluster in order to
recalculate the total available permits. A custom semaphore implementation is
needed here because we want to use the
[Semaphore::reducePermits](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html#reducePermits-int-)
method (when # of brokers increases or the cluster config value for max
concurrent queries decreases) which has `protected` access. We need to use this
method in order to be able to reduce the total number of permits for the
semaphore without blocking.
- The name of the cluster config doesn't indicate that it's a broker level
mechanism and hence can be re-used in the future if we move this logic to the
servers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]