Could you explain more how this affects Beam? Can Beam capture the row type
from the logical plan and pass it through to the output phase?

In my company's project, that is what we did. The project crafts the
logical plan, grabs the output type, executes the planner and passes the
resulting RelNode and the output type from the logical plan into the
execution phase.

On Thu, Feb 2, 2023 at 1:56 PM Andrew Pilloud <[email protected]>
wrote:

> A Beam SQL user found an issue where we are discarding their output field
> names that appears to be related to a change to the VolcanoPlanner to treat
> rel nodes that only differ by field names as equivalent:
>
> https://github.com/apache/calcite/commit/1e9b4da0573ec73d332d4e65fb7fd30491b4318d#diff-008c6d52bfd93bbe963a23c264bc412c68cac3b4837e3f10b8d5e4858cd4acb8L1136
>
> This creates an interaction between CalcMergeRule and the VolcanoPlanner
> when there are two equivalence sets with a Calc that only differs by output
> column name. The sets are merged and the output column rename is lost. This
> occurs with both Calcite's LogicalCalc and BeamCalc with a series of three
> mergeable Calc nodes.
>
> The minimal reproduction I've derived in Beam involves running the planner
> with CoreRules.FILTER_TO_CALC, CoreRules.PROJECT_TO_CALC,
> CoreRules.CALC_MERGE, and BeamCalcRule.INSTANCE using this test query:
> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
> id >= 1
>
> It creates a logical plan like this:
>     LogicalProject(fout_int=[$0], fout_string=[$1])
>       LogicalFilter(condition=[>=($0, 1)])
>         LogicalProject(id=[$0], v=[$1])
>           BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>
> And the planner creates a physical plan like this (with what started as the
> top LogicalProject dropped):
>     BeamCalcRel(expr#0..1=[{inputs}], expr#2=[1], expr#3=[>=($t0, $t2)],
> proj#0..1=[{exprs}], $condition=[$t3])
>       BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>
> I was able to work around it by limiting calc merging rules to our BeamCalc
> and adding field names back to the equivalency check there.
>
> https://github.com/apache/beam/pull/25290/files#diff-ead622461b5c25264d0c680fcacde454ff457b8c05dc73164cafd298573f56bcR58
>
> This seems like a bug to me but we've previously been told that Calcite
> doesn't promise to retain field names so I assume that Calcite doesn't
> consider this a bug? See
>
> https://issues.apache.org/jira/browse/CALCITE-1584?focusedCommentId=16031351&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16031351
>

Reply via email to