[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

HeartSaVioR Mon, 17 Oct 2016 00:15:58 -0700

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @manuzhang 
    Thanks for showing interest on this. Let me rearrange your questions:
    
    > Has this changed the workflow of StormSQL?
    
    No, we still rely on Trident, and nothing changed in point of workflow.
    
    > Motive behind this big change and benefits
    
    This is started from [Julian's 
comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472)
 and also [Milinda's 
comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182).
    
    For me having own relational algebras (rel) has several advantages, 
    
    - We can push operator handling logic to rel itself. Before that we should 
traverse Calcite logical rel  tree with PostOrderRelNodeVisitor, and visitor 
needs to handle Calcite's rel directly. Now the logic how to configure Trident 
topology is all handled from separate rels.
    
    - We sometimes want to have more derived rels compared to Calcite logical 
operators. One of example is `Join`.  There's only one logical rel regarding 
join in Calcite - LogicalJoin - but we're now converting LogicalJoin to 
EquiJoin if conditions are met. If we have various types of join it will make 
the difference. We're not prepared yet, but streaming scan vs table scan, and 
streaming insert vs table insert are the other cases.
    
    ```
    TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
      TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, 
$t5)], DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
        TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
          TridentStormStreamScanRel(table=[[EMP]])
          TridentStormStreamScanRel(table=[[DEPT]])
    ```
    
    We can even override the methods how to represent the rel in explain string 
if we think Calcite's explain is less informational. For example, showing 
initial parallelism (when we support) for Scan.
    
    - This patch starts addressing query optimizations. One of example is Calc, 
which is for merging multiple calculations into one. No need to filter and 
projection separately. They're still not minimized (as projection is), but it 
can be addressed after STORM-2072. Defining derived rels helps further query 
optimizations, like filter pushdown. Calcite rels is not aware of data source 
characteristic, and we can include it to our own rels.
    
    Please don't hesitate to ask questions. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Reply via email to