Re: Using VolcanoPlanner to create unified RelNode tree from multiple SQL

Julian Hyde Mon, 10 Oct 2016 10:43:54 -0700

It sounds as if your requirements go beyond CALCITE-481. If so, please log a 
JIRA case.


> On Oct 10, 2016, at 8:52 AM, Chinmay Kolhatkar <chin...@apache.org> wrote:
> 
> Thanks Julian. I think CALCITE-481 would be a good improvement.
> 
> On Wed, Oct 5, 2016 at 11:02 PM, Julian Hyde <jhyde.apa...@gmail.com 
> <mailto:jhyde.apa...@gmail.com>> wrote:
> 
>> What you are asking for is not currently possible but would be a very
>> interesting, useful and powerful extension to Calcite. And in my opinion it
>> fits well with the way with the way that Volcano/Cascades optimizers work.
>> The idea would be for the optimizer to work on a forest (a set of trees
>> with different roots, but shared leaves).
>> 
>> It is related to https://issues.apache.org/jira/browse/CALCITE-481, which
>> deals with trees that have a single root but store intermediate results so
>> that they can be used more than once. I surmise that you are also very
>> interested in maximizing the commonality between trees.
>> 
>> The trickiest part is to get the cost model right. You need to account for
>> a cost each time the work is done, not each time the result is used. Jesus
>> researched this
>> very topic, using integer linear programming (ILP) to incorporate these
>> considerations into the cost model.
>> 
>> Can you please create a JIRA case for it? Maybe Jesus can add some
>> comments there.
>> 
>> It is also related to Hive multi-table queries [1]. I have not seen a SQL
>> syntax for multi-output queries. Maybe we can devise one. Or does someone
>> know of one?
>> 
>> And it presents challenges/opportunities for engines. Some engines can
>> push to multiple consumers, or equivalently create a stream of tuples that
>> can be read by multiple consumers, others can only persist an intermediate
>> result that can be read by multiple consumers. Spark had their own version
>> of this discussion when implementing multitable in Hive-Spark[2].
>> 
>> Julian
>> 
>> [1] https://cwiki.apache.org/confluence/display/Hive/
>> GettingStarted#GettingStarted-MULTITABLEINSERT <https://cwiki.apache.org/ 
>> <https://cwiki.apache.org/>
>> confluence/display/Hive/GettingStarted#GettingStarted-MULTITABLEINSERT>
>> 
>> [2] https://issues.apache.org/jira/browse/SPARK-3622 
>> <https://issues.apache.org/jira/browse/SPARK-3622> <
>> https://issues.apache.org/jira/browse/SPARK-3622>
>> 
>>> On Oct 4, 2016, at 9:23 AM, Chinmay Kolhatkar <chin...@datatorrent.com>
>> wrote:
>>> 
>>> Dear Community,
>>> 
>>> I'm working on integration with Apache Apex and Apache Calcite
>> (underlying
>>> Apache Jira is: APEXMALHAR-1818).
>>> 
>>> I have a question related to conversion from SQL to RelNode Tree.
>>> 
>>> Is it possible that VolcanoPlanner can take multiple SQL statements as
>>> input and return a unified RelNode Tree?
>>> Example of above is as follows:
>>> 
>>> SELECT COL1, COL2 FROM TABLE WHERE COL3 > 10;
>>> SELECT COL1, COL2 FROM TABLE WHERE COL4  = 'abc';
>>> 
>>> Above 2 statements has a common path and hence can provide an unified
>>> RelNode tree as follows:
>>> 
>>> [Scan] -> [Project (COL1, COL2)] -> [Filter (COL4 = 'abc')] -> [Delta]
>>>                   |
>>>                   V
>>>           [Filter (COL3 > 10)]
>>>                   |
>>>                   v
>>>                [Delta]
>>> 
>>> 
>>> Thanks,
>>> Chinmay.

Re: Using VolcanoPlanner to create unified RelNode tree from multiple SQL

Reply via email to