ankitsultana opened a new issue, #15940:
URL: https://github.com/apache/pinot/issues/15940

   ## Summary
   
   It is well known that Join Queries tend to be quite costly in terms of CPU 
as well as Memory requirements. Also, it is very common that a given Pinot 
tenant would be serving a critical use-case backed by the V1 Engine at moderate 
QPS, and users might want to run ad hoc join queries too alongside that.
   
   Given this, we are considering to add an option to select a different 
compute-only tenant for running the non-leaf stages for a MSE query.
   
   To be precise, the idea is:
   
   1. The leaf stage worker assignment will run as is, since it is dependent 
purely on segment selection, which the new query optimizer delegates to the 
Routing Manager.
   2. For each non-leaf stage worker, we will use all the servers in a tenant 
that may not be the same as any of the tenants used by the leaf stage. A user 
could configure this through a query option to begin with, but I'd imagine we'd 
need better ways to allow users to configure this.
   
   Since the leaf stage already supports streaming the response, I believe this 
could unlock a significantly safer way to support ad hoc MSE queries alongside 
V1 Engine queries.
   
   ## Related Discussions
   
   There are some other ideas discussed by the community recently which are 
sort of related to this.
   
   E.g. pinning MSE queries to a specific replica-group. I think those ideas 
are good too and can be delivered independently. With the Physical Query 
Optimizer, they require no additional work outside of adding the necessary 
support in the Broker Routing Manager.
   
   ## Bigger Picture
   
   From an architecture point of view, this idea is going in a direction where 
one day one could swap out the query execution engine altogether for the 
intermediate stages. E.g. use something like Arrow for exporting data from the 
leaf stage, and use DataFusion to run the intermediate stages.
   
   We are a bit far from getting there, but maybe not that far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to