[ https://issues.apache.org/jira/browse/STORM-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated STORM-1446: -------------------------------- Description: As suggested in https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651, compiling the logical plan from Calcite down to Storm physical plan will clarify the implementation of StormSQL. > Motive behind this big change and benefits This is started from [Julian's comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472] and also [Milinda's comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182]. For me having own relational algebras (rel) has several advantages, * We can push operator handling logic to rel itself. Before that we should traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor needs to handle Calcite's rel directly. Now the logic how to configure Trident topology is all handled from separate rels. * We sometimes want to have more derived rels compared to Calcite logical operators. One of example is Join. There's only one logical rel regarding join in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if conditions are met. If we have various types of join it will make the difference. We're not prepared yet, but streaming scan vs table scan, and streaming insert vs table insert are the other cases. {code} TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()]) TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], DEPTID=[$t3], EMPID=[$t0], $condition=[$t6]) TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner]) TridentStormStreamScanRel(table=[[EMP]]) TridentStormStreamScanRel(table=[[DEPT]]) {code} * We can even override the methods how to represent the rel in explain string if we think Calcite's explain is less informational. For example, showing initial parallelism (when we support) for Scan. * We can apply query optimizations: Defining derived rels helps further query optimizations, like filter pushdown. Calcite rels is not aware of data source characteristic, and we can include it to our own rels. was: As suggested in https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651, compiling the logical plan from Calcite down to Storm physical plan will clarify the implementation of StormSQL. > Motive behind this big change and benefits This is started from [Julian's comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472) and also [Milinda's comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182). For me having own relational algebras (rel) has several advantages, * We can push operator handling logic to rel itself. Before that we should traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor needs to handle Calcite's rel directly. Now the logic how to configure Trident topology is all handled from separate rels. * We sometimes want to have more derived rels compared to Calcite logical operators. One of example is Join. There's only one logical rel regarding join in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if conditions are met. If we have various types of join it will make the difference. We're not prepared yet, but streaming scan vs table scan, and streaming insert vs table insert are the other cases. {code} TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()]) TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], DEPTID=[$t3], EMPID=[$t0], $condition=[$t6]) TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner]) TridentStormStreamScanRel(table=[[EMP]]) TridentStormStreamScanRel(table=[[DEPT]]) {code} * We can even override the methods how to represent the rel in explain string if we think Calcite's explain is less informational. For example, showing initial parallelism (when we support) for Scan. * We can apply query optimizations: Defining derived rels helps further query optimizations, like filter pushdown. Calcite rels is not aware of data source characteristic, and we can include it to our own rels. > Compile the Calcite logical plan to Storm Trident logical plan > -------------------------------------------------------------- > > Key: STORM-1446 > URL: https://issues.apache.org/jira/browse/STORM-1446 > Project: Apache Storm > Issue Type: Improvement > Components: storm-sql > Reporter: Haohui Mai > Assignee: Jungtaek Lim > Time Spent: 1h > Remaining Estimate: 0h > > As suggested in > https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651, > compiling the logical plan from Calcite down to Storm physical plan will > clarify the implementation of StormSQL. > > Motive behind this big change and benefits > This is started from [Julian's > comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472] > and also [Milinda's > comment|https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182]. > For me having own relational algebras (rel) has several advantages, > * We can push operator handling logic to rel itself. Before that we should > traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor > needs to handle Calcite's rel directly. Now the logic how to configure > Trident topology is all handled from separate rels. > * We sometimes want to have more derived rels compared to Calcite logical > operators. One of example is Join. There's only one logical rel regarding > join in Calcite - LogicalJoin - but we're now converting LogicalJoin to > EquiJoin if conditions are met. If we have various types of join it will make > the difference. We're not prepared yet, but streaming scan vs table scan, and > streaming insert vs table insert are the other cases. > {code} > TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()]) > TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], > DEPTID=[$t3], EMPID=[$t0], $condition=[$t6]) > TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner]) > TridentStormStreamScanRel(table=[[EMP]]) > TridentStormStreamScanRel(table=[[DEPT]]) > {code} > * We can even override the methods how to represent the rel in explain string > if we think Calcite's explain is less informational. For example, showing > initial parallelism (when we support) for Scan. > * We can apply query optimizations: Defining derived rels helps further query > optimizations, like filter pushdown. Calcite rels is not aware of data source > characteristic, and we can include it to our own rels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)