gianm commented on pull request #9773: URL: https://github.com/apache/druid/pull/9773#issuecomment-622457445
By the way, let me discuss for a bit what this cost-estimation stuff is trying to accomplish, in case that's helpful. In Druid native queries, in order for a Join tree (multiple stacked joins) to execute without subqueries, it must be left-heavy and the bottom-leftmost must be a table. We are trying to encourage the SQL planner to put things in that form whenever possible. "Must be left-heavy" means that the right child should be a regular datasource, not a join. The left child could potentially be a join."Bottom-leftmost must be a table" means that if you follow the chain of joins down the left-child pointers, the final, leaf regular datasource must be a table. What if we need to apply some filters or projections to that table? In Druid native queries, we do that by adding `filter` or `virtualColumns` in the query that references the outermost join datasource. But it is important, for performance, to push filters and projections down past a join when a query is executed. We do this at the native query level: if you have a filter or virtual column on a join that could apply to the base table, we apply it to the base table. So we want the SQL planner to generate queries where projects and filters are completely above the join, and we'll push them down later, at the native query execution stage. On the other hand, what if the SQL planner generates projects and filters below a join? That's bad, because the only way the SQL-to-native translation can handle that structure is by generating a subquery. This is bad because subqueries require transferring the full results of the subquery to the Broker. So I think broadly we have to do either one of two things. 1. Figure out a way to compute costs that encourages the SQL planner to push projects and filters above joins consistently. 2. Extend the native query language so you can specify a filter or virtualColumn at the level of a join datasource, so even if the SQL planner generates a rel tree with a filter randomly in the middle, the SQL-to-native translation can attach the filter to the join datasource and does not need to generate a subquery. IMO (1) is nice since it keeps the native language simple. But if it's not workable, we might need to go with (2). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
