Ok this makes sense, it's not really a bug, then we fall into the option 2.

When you mention column pruning is there a specific rule TransformationRule
you have in mind?
https://github.com/apache/calcite/tree/main/core/src/main/java/org/apache/calcite/rel/rules


On Tue, Oct 3, 2023 at 3:59 PM Julian Hyde <[email protected]> wrote:

> Well, it’s not really “there”. It is removed by the final project. If you
> want to remove it earlier, do column pruning.
>
> (We aren’t aggressive with column pruning because it will introduce extra
> projects into the plan, therefore make the plan more complex, and increase
> the planner’s search space.)
>
> When we generate SQL, most people would prefer to see ’SELECT *’ in the
> intermediate subqueries rather than a more focused list of columns, because
> the ’SELECT *’ form is shorter. We assume that the target SQL system will
> do column pruning and therefore the queries are equivalent.
>
> > On Oct 3, 2023, at 3:53 PM, Guillaume Masse <
> [email protected]> wrote:
> >
> > It's because column2 is a column I created artificially to allow the
> > validation of a query. I normally perform a transformation (RelNode =>
> > RelNode) that removes any reference to that column. The problem is that
> > when I convert this Logical Plan back to SQL, it contains references to
> > that column and the execution engine should not see that column and
> throws
> > an error.
> >
> > LogicalProject(column1=[$1], column10=[$4])
> >  LogicalJoin(condition=[=($2, $4)], joinType=[inner])
> >    LogicalProject(column2=[$0], column1=[$1], $f2=[LOWER($1)])
> >      LogicalTableScan(table=[[t1]])
> >    LogicalTableScan(table=[[t2]])
> >
> > My problem is the added field column2=[$0] in the LogicalProject. Since
> > it's not contributing to the query, it should not be there.
> >
> > On Tue, Oct 3, 2023 at 3:24 PM Julian Hyde <[email protected]>
> wrote:
> >
> >> Are you worried that $3 in the first plan has become $4 in the second
> plan?
> >>
> >> That’s easily explained because the intermediate table has an extra
> >> column, LOWER($1). That pushes up the offset of all columns coming from
> the
> >> right-hand table.
> >>
> >>> On Oct 3, 2023, at 3:14 PM, Guillaume Masse <
> >> [email protected]> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I have two tables: t1 and t2 both have a column1 of type t and a
> column2
> >>> that I'm not using
> >>> I have a function f: t -> t defined in the catalog
> >>>
> >>> After validation if I print the logical plan I get the following
> result:
> >>>
> >>> select
> >>> t1.column1,
> >>> t2.column1
> >>> from company_data."1" t1
> >>> join company_data."2" t2
> >>> on t1.column1 = t2.column1
> >>>
> >>> LogicalProject(column1=[$1], column10=[$3])
> >>> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> >>>   LogicalTableScan(table=[[t1]])
> >>>   LogicalTableScan(table=[[t2]])
> >>>
> >>>
> >>> select
> >>> t1.column1,
> >>> t2.column1
> >>> from company_data."1" t1
> >>> join company_data."2" t2
> >>> on lower(t1.column1) = t2.column1
> >>>
> >>> LogicalProject(column1=[$1], column10=[$4])
> >>> LogicalJoin(condition=[=($2, $4)], joinType=[inner])
> >>>   LogicalProject(column2=[$0], column1=[$1], $f2=[LOWER($1)])
> >>>     LogicalTableScan(table=[[t1]])
> >>>   LogicalTableScan(table=[[t2]])
> >>>
> >>>
> >>> I'm a bit surprise by the second logical plan because it projects
> >> column2.
> >>>
> >>> 1) I'm wondering if this is a bug.
> >>>
> >>> 2) If it's not a bug, let's say it was injected by the validator to
> >>> simplify the validation process, is there a way to remove it? It feels
> a
> >>> bit weird, because I would have to rewrite all the offsets when I'm
> >>> reconstructing the tree:
> >>>
> >>> LogicalProject(column1=[$0], column10=[$3])
> >>> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> >>>   LogicalProject(column1=[$0], $f2=[LOWER($1)])
> >>>     LogicalTableScan(table=[[t1]])
> >>>   LogicalTableScan(table=[[t2]])
> >>>
> >>>
> >>> Thank You!
> >>>
> >>> --
> >>> Guillaume Massé
> >>> [Gee-OHM]
> >>> (马赛卫)
> >>
> >>
>
>

Reply via email to