[ 
https://issues.apache.org/jira/browse/STORM-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-1446:
--------------------------------
    Description: 
As suggested in 
https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651,
 compiling the logical plan from Calcite down to Storm physical plan will 
clarify the implementation of StormSQL.

> Motive behind this big change and benefits
This is started from [Julian's 
comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472]
 and also [Milinda's 
comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182].

For me having own relational algebras (rel) has several advantages,

* We can push operator handling logic to rel itself. Before that we should 
traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor 
needs to handle Calcite's rel directly. Now the logic how to configure Trident 
topology is all handled from separate rels.

* We sometimes want to have more derived rels compared to Calcite logical 
operators. One of example is Join. There's only one logical rel regarding join 
in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if 
conditions are met. If we have various types of join it will make the 
difference. We're not prepared yet, but streaming scan vs table scan, and 
streaming insert vs table insert are the other cases.

{code}
TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
  TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], 
DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
    TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
      TridentStormStreamScanRel(table=[[EMP]])
      TridentStormStreamScanRel(table=[[DEPT]])
{code}

* We can even override the methods how to represent the rel in explain string 
if we think Calcite's explain is less informational. For example, showing 
initial parallelism (when we support) for Scan.

* We can apply query optimizations: Defining derived rels helps further query 
optimizations, like filter pushdown. Calcite rels is not aware of data source 
characteristic, and we can include it to our own rels.

  was:
As suggested in 
https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651,
 compiling the logical plan from Calcite down to Storm physical plan will 
clarify the implementation of StormSQL.

> Motive behind this big change and benefits
This is started from [Julian's 
comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472)
 and also [Milinda's 
comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182).

For me having own relational algebras (rel) has several advantages,

* We can push operator handling logic to rel itself. Before that we should 
traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor 
needs to handle Calcite's rel directly. Now the logic how to configure Trident 
topology is all handled from separate rels.

* We sometimes want to have more derived rels compared to Calcite logical 
operators. One of example is Join. There's only one logical rel regarding join 
in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if 
conditions are met. If we have various types of join it will make the 
difference. We're not prepared yet, but streaming scan vs table scan, and 
streaming insert vs table insert are the other cases.

{code}
TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
  TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], 
DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
    TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
      TridentStormStreamScanRel(table=[[EMP]])
      TridentStormStreamScanRel(table=[[DEPT]])
{code}

* We can even override the methods how to represent the rel in explain string 
if we think Calcite's explain is less informational. For example, showing 
initial parallelism (when we support) for Scan.

* We can apply query optimizations: Defining derived rels helps further query 
optimizations, like filter pushdown. Calcite rels is not aware of data source 
characteristic, and we can include it to our own rels.


> Compile the Calcite logical plan to Storm Trident logical plan
> --------------------------------------------------------------
>
>                 Key: STORM-1446
>                 URL: https://issues.apache.org/jira/browse/STORM-1446
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-sql
>            Reporter: Haohui Mai
>            Assignee: Jungtaek Lim
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> As suggested in 
> https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651,
>  compiling the logical plan from Calcite down to Storm physical plan will 
> clarify the implementation of StormSQL.
> > Motive behind this big change and benefits
> This is started from [Julian's 
> comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472]
>  and also [Milinda's 
> comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182].
> For me having own relational algebras (rel) has several advantages,
> * We can push operator handling logic to rel itself. Before that we should 
> traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor 
> needs to handle Calcite's rel directly. Now the logic how to configure 
> Trident topology is all handled from separate rels.
> * We sometimes want to have more derived rels compared to Calcite logical 
> operators. One of example is Join. There's only one logical rel regarding 
> join in Calcite - LogicalJoin - but we're now converting LogicalJoin to 
> EquiJoin if conditions are met. If we have various types of join it will make 
> the difference. We're not prepared yet, but streaming scan vs table scan, and 
> streaming insert vs table insert are the other cases.
> {code}
> TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
>   TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], 
> DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
>     TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
>       TridentStormStreamScanRel(table=[[EMP]])
>       TridentStormStreamScanRel(table=[[DEPT]])
> {code}
> * We can even override the methods how to represent the rel in explain string 
> if we think Calcite's explain is less informational. For example, showing 
> initial parallelism (when we support) for Scan.
> * We can apply query optimizations: Defining derived rels helps further query 
> optimizations, like filter pushdown. Calcite rels is not aware of data source 
> characteristic, and we can include it to our own rels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to