Dear Calcite Devs,

I’m currently looking into an issue in the SQL extension [1] of Apache Beam and 
was hoping to find some advice here.
Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into a 
tree of BeamRelNodes which is then used to build a Beam DAG. Pretty standard I 
suppose …

What I’m scratching my head about is that applying the converter rules changes 
the semantics of the graph, which it shouldn’t I thought. Or is that a wrong 
expectation?
Here’s a very simple SQL example to illustrate this (see also BeamUnionRelTest 
[3]):

SELECT  order_id, site_id, price FROM ORDER_DETAILS
UNION ALL
SELECT  order_id, site_id, price FROM ORDER_DETAILS

This is where we start at, the corresponding RelNode:

LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)

Once the corresponding conversion rule [4] is applied to above by the 
CheapestPlanReplacer, but before visiting its old inputs, we get the following:

BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)

At this point both inputs refer to the same node (#25). However, once visiting 
the inputs [5] in CheapestPlanReplacer, that semantic information is lost as 
RelNodes get copied if inputs change [5].
In below result, the two inputs refer to different nodes:

BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)

This, however, currently prevents caching of intermediate results in the Beam 
DAG when this might be beneficial.

Would you have any advice how to better approach this?
Of course, I could use a stateful copy operation to handle such repeated copy 
operations with the same parameters. But this seems wrong to be honest.

Thanks a million!
Kind regards,
Moritz


[1] https://github.com/apache/beam/issues/24314
[2] 
https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java#L136-L152
[3] 
https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java#L52-L71
[4] 
https://github.com/apache/beam/blob/a96afe2c57c45a869a622086eaa4f81305f06e72/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java
[5] 
https://github.com/apache/calcite/blob/a326bd2d0e0b4b6b3336f10217b0ecbb79522239/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L727-L739

As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


Reply via email to