ankitsultana opened a new issue, #15940: URL: https://github.com/apache/pinot/issues/15940
## Summary It is well known that Join Queries tend to be quite costly in terms of CPU as well as Memory requirements. Also, it is very common that a given Pinot tenant would be serving a critical use-case backed by the V1 Engine at moderate QPS, and users might want to run ad hoc join queries too alongside that. Given this, we are considering to add an option to select a different compute-only tenant for running the non-leaf stages for a MSE query. To be precise, the idea is: 1. The leaf stage worker assignment will run as is, since it is dependent purely on segment selection, which the new query optimizer delegates to the Routing Manager. 2. For each non-leaf stage worker, we will use all the servers in a tenant that may not be the same as any of the tenants used by the leaf stage. A user could configure this through a query option to begin with, but I'd imagine we'd need better ways to allow users to configure this. Since the leaf stage already supports streaming the response, I believe this could unlock a significantly safer way to support ad hoc MSE queries alongside V1 Engine queries. ## Related Discussions There are some other ideas discussed by the community recently which are sort of related to this. E.g. pinning MSE queries to a specific replica-group. I think those ideas are good too and can be delivered independently. With the Physical Query Optimizer, they require no additional work outside of adding the necessary support in the Broker Routing Manager. ## Bigger Picture From an architecture point of view, this idea is going in a direction where one day one could swap out the query execution engine altogether for the intermediate stages. E.g. use something like Arrow for exporting data from the leaf stage, and use DataFusion to run the intermediate stages. We are a bit far from getting there, but maybe not that far. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
