Julian Hyde commented on CALCITE-1440:

I definitely think it should be done in VolcanoPlanner - dynamic programming is 
even more important for multi-root trees.

I initially thought we could change the {{RelOptPlanner}} methods

void setRoot(RelNode rel);
RelNode findBestExp();


void setRoots(List<RelNode> rels);
List<RelNode> findBestExps();

The idea being to optimize several relational expressions at the same time. But 
then I realized we can combine the relational expressions using a new 
relational operator:

public class Combine extends AbstractRelNode {
  protected final ImmutableList<RelNode> inputs;
  public Combine(RelOptCluster cluster, RelTraitSet traitSet, List<RelNode> 
inputs) { ... }

This lets us pass multiple relational operators into and out of the planner, so 
we don't need to change {{setRoot}} and {{findBestExp}}.

{{Combine}} is similar to {{Union}} except that it doesn't require the inputs 
to have the same row type. Some more points about it:
* In order to execute, all of the inputs need to be executed, and therefore 
they all contribute to its cost.
* DML operations (insert, update, delete) are modeled as relational operators 
(albeit they return a single row with a single "row count" column) and 
therefore {{Combine}} can be used to represent a query that consists of 
multiple queries and DML statements.
** We'd need to take care if one statement executes after another and is 
intended to see the data that it produced. For example, in the following, the 
statements are seeing different {{emp}} relations:{code}
UPDATE emp SET sal = sal * 2 WHERE deptno = 10;
SELECT * FROM emp WHERE sal > 1000;
* A concrete implementation of {{Combine}} would make all of the constituent 
relational expressions accessible (say, as JDBC ResultSets), but we're mainly 
interested in it as a "binder" for planning purposes.
* Different variants of {{Combine}} might specify that the constituent queries 
run in series, or parallel, or some more complex order, or just say "I don't 
care". Sequencing matters a lot when we get to physical optimization (e.g. 
allocating scarce memory), but I don't think it matters much during logical 
* The interesting plans produced for a {{Combine}} query almost certainly 
involve a {{Spool}} operator (see CALCITE-481). The  {{Combine}} and {{Spool}} 
operators have a similar purpose: they both aim to make "difficult" graphs 
(forests and DAGs, respectively) look like trees.
* There would be changes to the various metadata classes. E.g. we'd add a 
{{public RelOptCost getCumulativeCost(Combine rel, RelMetadataQuery mq)}} 
method to one of the metadata providers.

> Implement planner for converting multiple SQL statements to unified RelNode 
> Tree
> --------------------------------------------------------------------------------
>                 Key: CALCITE-1440
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1440
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Chinmay Kolhatkar
>            Assignee: Julian Hyde
> This can be implemented as a separate planner or in {{VolcanoPlanner}} 
> itself. The planner should take multiple SQL statements as input and return a 
> unified {{RelNode}} tree.
> Example of above is as follows:
> The above 2 statements have a common path and hence can provide a unified 
> {{RelNode}} tree as follows:
> {noformat}
>  [Scan] -> [Project (COL1, COL2)] -> [Filter (COL4 = 'abc')] -> [Delta]
>                     |
>                     V
>             [Filter (COL3 > 10)]
>                     |
>                     v
>                  [Delta]
> {noformat}

This message was sent by Atlassian JIRA

Reply via email to