Hi devs, I'm working on improving Storm SQL to make it useful for actual use cases.
I saw some comments [1] from Julian and Milinda in the early stage of Storm SQL which stated that having Storm physical algebra would be better than operating on Calcite logical algebra. Could you shed a light on understanding what that comments meaning? I'd like to get some resources, documents, and recommendations. I'm reading the code of Samza SQL planner but would also like to read some documents for that. Thanks in advance for your support. Explaining some backgrounds on current Storm SQL: Storm SQL relies on Trident which is a high-level abstraction on "micro-batch" transactional API. Trident supports streaming computations like functions, filters, grouping, aggregations, joins so Calcite logical algebra could be directly matched to Trident topology. Because of that we still rely on converting calcite logical plan to Trident topology [2], but it generates lots of nodes (operators) which should be optimized [3]. Trident itself will optimize them for merging nodes into one execution component - Bolt - but it doesn't address the SQL optimizations like pushdown filter. Thanks! Jungtaek Lim (HeartSaVioR) [1] https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651 [2] https://github.com/apache/storm/blob/master/external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/compiler/backends/trident/TridentLogicalPlanCompiler.java [3] https://docs.google.com/spreadsheets/d/1tTyIxbNafvVw0tQjWjiBQCKDCPbbnfbT5JZ7F5JEBNE
