gianm commented on pull request #9773:
URL: https://github.com/apache/druid/pull/9773#issuecomment-622457445


   By the way, let me discuss for a bit what this cost-estimation stuff is 
trying to accomplish, in case that's helpful.
   
   In Druid native queries, in order for a Join tree (multiple stacked joins) 
to execute without subqueries, it must be left-heavy and the bottom-leftmost 
must be a table. We are trying to encourage the SQL planner to put things in 
that form whenever possible. "Must be left-heavy" means that the right child 
should be a regular datasource, not a join. The left child could potentially be 
a join."Bottom-leftmost must be a table" means that if you follow the chain of 
joins down the left-child pointers, the final, leaf regular datasource must be 
a table.
   
   What if we need to apply some filters or projections to that table? In Druid 
native queries, we do that by adding `filter` or `virtualColumns` in the query 
that references the outermost join datasource.
   
   But it is important, for performance, to push filters and projections down 
past a join when a query is executed. We do this at the native query level: if 
you have a filter or virtual column on a join that could apply to the base 
table, we apply it to the base table.
   
   So we want the SQL planner to generate queries where projects and filters 
are completely above the join, and we'll push them down later, at the native 
query execution stage.
   
   On the other hand, what if the SQL planner generates projects and filters 
below a join? That's bad, because the only way the SQL-to-native translation 
can handle that structure is by generating a subquery. This is bad because 
subqueries require transferring the full results of the subquery to the Broker.
   
   So I think broadly we have to do either one of two things.
   
   1. Figure out a way to compute costs that encourages the SQL planner to push 
projects and filters above joins consistently.
   
   2. Extend the native query language so you can specify a filter or 
virtualColumn at the level of a join datasource, so even if the SQL planner 
generates a rel tree with a filter randomly in the middle, the SQL-to-native 
translation can attach the filter to the join datasource and does not need to 
generate a subquery.
   
   IMO (1) is nice since it keeps the native language simple. But if it's not 
workable, we might need to go with (2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to