Re: Output column names of Calc discarded by VolcanoPlanner in Calcite 2.22+

Julian Hyde Fri, 03 Feb 2023 11:28:14 -0800

Thanks for citing CALCITE-1584. That pretty much is our position.

Beam should not rely on field names, or base equality on field names.
(Otherwise the planner will start to choke on cruft.)


If you need to know where a field came from, use provenance tools like
RelMdColumnOrigins.

That said, every planner rule should endeavor to generate good field
names. They make for a better debugging experience, and every rule
needs to play its part. If a rule that converts Project and/or Filter
to Calc doesn't generate sensible field names we would consider that a
minor bug and would accept a PR.

Julian


On Fri, Feb 3, 2023 at 8:02 AM Scott Reynolds <[email protected]> wrote:
>
> Could you explain more how this affects Beam? Can Beam capture the row type
> from the logical plan and pass it through to the output phase?
>
> In my company's project, that is what we did. The project crafts the
> logical plan, grabs the output type, executes the planner and passes the
> resulting RelNode and the output type from the logical plan into the
> execution phase.
>
> On Thu, Feb 2, 2023 at 1:56 PM Andrew Pilloud <[email protected]>
> wrote:
>
> > A Beam SQL user found an issue where we are discarding their output field
> > names that appears to be related to a change to the VolcanoPlanner to treat
> > rel nodes that only differ by field names as equivalent:
> >
> > https://github.com/apache/calcite/commit/1e9b4da0573ec73d332d4e65fb7fd30491b4318d#diff-008c6d52bfd93bbe963a23c264bc412c68cac3b4837e3f10b8d5e4858cd4acb8L1136
> >
> > This creates an interaction between CalcMergeRule and the VolcanoPlanner
> > when there are two equivalence sets with a Calc that only differs by output
> > column name. The sets are merged and the output column rename is lost. This
> > occurs with both Calcite's LogicalCalc and BeamCalc with a series of three
> > mergeable Calc nodes.
> >
> > The minimal reproduction I've derived in Beam involves running the planner
> > with CoreRules.FILTER_TO_CALC, CoreRules.PROJECT_TO_CALC,
> > CoreRules.CALC_MERGE, and BeamCalcRule.INSTANCE using this test query:
> > WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
> > PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
> > id >= 1
> >
> > It creates a logical plan like this:
> >     LogicalProject(fout_int=[$0], fout_string=[$1])
> >       LogicalFilter(condition=[>=($0, 1)])
> >         LogicalProject(id=[$0], v=[$1])
> >           BeamIOSourceRel(table=[[beam, PCOLLECTION]])
> >
> > And the planner creates a physical plan like this (with what started as the
> > top LogicalProject dropped):
> >     BeamCalcRel(expr#0..1=[{inputs}], expr#2=[1], expr#3=[>=($t0, $t2)],
> > proj#0..1=[{exprs}], $condition=[$t3])
> >       BeamIOSourceRel(table=[[beam, PCOLLECTION]])
> >
> > I was able to work around it by limiting calc merging rules to our BeamCalc
> > and adding field names back to the equivalency check there.
> >
> > https://github.com/apache/beam/pull/25290/files#diff-ead622461b5c25264d0c680fcacde454ff457b8c05dc73164cafd298573f56bcR58
> >
> > This seems like a bug to me but we've previously been told that Calcite
> > doesn't promise to retain field names so I assume that Calcite doesn't
> > consider this a bug? See
> >
> > https://issues.apache.org/jira/browse/CALCITE-1584?focusedCommentId=16031351&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16031351
> >

Re: Output column names of Calc discarded by VolcanoPlanner in Calcite 2.22+

Reply via email to