[
https://issues.apache.org/jira/browse/IMPALA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers reassigned IMPALA-7831:
-----------------------------------
Assignee: (was: Paul Rogers)
> Revisit expression rewriting integration with planner
> -----------------------------------------------------
>
> Key: IMPALA-7831
> URL: https://issues.apache.org/jira/browse/IMPALA-7831
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Paul Rogers
> Priority: Major
>
> The planner performs expression rewriting. It appears that the rewrite engine
> was added late in planner development, as an add-on step in
> {{AnalysisContext}} after we create the plan. Since that time, it appears
> that a number of fixes and patches have been applied to work around the
> inevitable bugs that resulted from this placement of the logic.
> At present, the planner flow, with rewrites, is:
> * Analyze the entire query
> * Assign WHERE clause "conjuncts" to scan nodes, etc.
> * Cerate theĀ full plan
> * Rewrite the SELECT, WHERE, HAVING and GROUP BY clauses
> * Throw away the plan create above and create a new one
> This ticket proposes to adjust the flow to incorporate rewrites earlier in
> the process, allowing the planner to make a single pass over the query.
> (Which will solve a number of bugs described in associated tickets.)
> h4. Background
> The above logic evolved because of a timing issue: once we assign conjuncts,
> we have plan nodes that point to the original WHERE clause expressions. We
> later rewrite these, but we do so by throwing away the original nodes,
> replacing them with new ones. Since the scan and other nodes still have a
> pointer to the old version, the rewrites can have no effect.
> To work around this, the code throws away that original plan and replans
> using the new, rewritten nodes.
> This then creates an interesting issue. We do the full analysis (and plan)
> because we need the column bindings in order to do the rewrite. Since
> plan/analysis is implemented as a single black box, rewrites can't be done
> before planning (no column binding yet) so must be done after (column
> bindings available, but so is the entire plan.)
> Some expression nodes have incomplete implementations. For example, {{X
> BETWEEN Y AND Z}} does not compute a cost (because it is a "virtual" node: it
> does not exist at run time, having been rewritten to {{Y <= X AND X <= Z}}.)
> This means that, not only do we throw away the first plan, that first plan
> was actually wrong: it used incomplete information.
> Thus, in order to get the semantic info needed for rewrites (column
> bindings), we end up creating an entire plan which we must then discard and
> rebuild after doing the rewrites (so the planner has the full information.)
> h4. Alternative
> The alternative approach is to integrate expression rewrites into the planner
> process, rather than doing them from the outside so that we make only a
> single pass through the planner. In particular:
> * Analyze expressions to create column bindings.
> * Match up SELECT and GROUP BY and other expressions (if required.) GROUP BY
> points to a SELECT clause node (so it will see rewrites) rather than each
> SELECT expression (which will be discarded.)
> * Rewrite SELECT and WHERE clause expressions. (Bound GROUP BY expressions
> will see the rewrites.)
> * Complete the plan as today.
> With this approach, we plan only once, and that plan has a full set of cost
> information based on the rewritten expressions which the BE will execute.
> The purpose of this ticket is to track this analysis and to later propose a
> detailed fix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]