Well, it’s not really “there”. It is removed by the final project. If you want 
to remove it earlier, do column pruning.

(We aren’t aggressive with column pruning because it will introduce extra 
projects into the plan, therefore make the plan more complex, and increase the 
planner’s search space.)

When we generate SQL, most people would prefer to see ’SELECT *’ in the 
intermediate subqueries rather than a more focused list of columns, because the 
’SELECT *’ form is shorter. We assume that the target SQL system will do column 
pruning and therefore the queries are equivalent. 

> On Oct 3, 2023, at 3:53 PM, Guillaume Masse 
> <[email protected]> wrote:
> 
> It's because column2 is a column I created artificially to allow the
> validation of a query. I normally perform a transformation (RelNode =>
> RelNode) that removes any reference to that column. The problem is that
> when I convert this Logical Plan back to SQL, it contains references to
> that column and the execution engine should not see that column and throws
> an error.
> 
> LogicalProject(column1=[$1], column10=[$4])
>  LogicalJoin(condition=[=($2, $4)], joinType=[inner])
>    LogicalProject(column2=[$0], column1=[$1], $f2=[LOWER($1)])
>      LogicalTableScan(table=[[t1]])
>    LogicalTableScan(table=[[t2]])
> 
> My problem is the added field column2=[$0] in the LogicalProject. Since
> it's not contributing to the query, it should not be there.
> 
> On Tue, Oct 3, 2023 at 3:24 PM Julian Hyde <[email protected]> wrote:
> 
>> Are you worried that $3 in the first plan has become $4 in the second plan?
>> 
>> That’s easily explained because the intermediate table has an extra
>> column, LOWER($1). That pushes up the offset of all columns coming from the
>> right-hand table.
>> 
>>> On Oct 3, 2023, at 3:14 PM, Guillaume Masse <
>> [email protected]> wrote:
>>> 
>>> Hi All,
>>> 
>>> I have two tables: t1 and t2 both have a column1 of type t and a column2
>>> that I'm not using
>>> I have a function f: t -> t defined in the catalog
>>> 
>>> After validation if I print the logical plan I get the following result:
>>> 
>>> select
>>> t1.column1,
>>> t2.column1
>>> from company_data."1" t1
>>> join company_data."2" t2
>>> on t1.column1 = t2.column1
>>> 
>>> LogicalProject(column1=[$1], column10=[$3])
>>> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
>>>   LogicalTableScan(table=[[t1]])
>>>   LogicalTableScan(table=[[t2]])
>>> 
>>> 
>>> select
>>> t1.column1,
>>> t2.column1
>>> from company_data."1" t1
>>> join company_data."2" t2
>>> on lower(t1.column1) = t2.column1
>>> 
>>> LogicalProject(column1=[$1], column10=[$4])
>>> LogicalJoin(condition=[=($2, $4)], joinType=[inner])
>>>   LogicalProject(column2=[$0], column1=[$1], $f2=[LOWER($1)])
>>>     LogicalTableScan(table=[[t1]])
>>>   LogicalTableScan(table=[[t2]])
>>> 
>>> 
>>> I'm a bit surprise by the second logical plan because it projects
>> column2.
>>> 
>>> 1) I'm wondering if this is a bug.
>>> 
>>> 2) If it's not a bug, let's say it was injected by the validator to
>>> simplify the validation process, is there a way to remove it? It feels a
>>> bit weird, because I would have to rewrite all the offsets when I'm
>>> reconstructing the tree:
>>> 
>>> LogicalProject(column1=[$0], column10=[$3])
>>> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
>>>   LogicalProject(column1=[$0], $f2=[LOWER($1)])
>>>     LogicalTableScan(table=[[t1]])
>>>   LogicalTableScan(table=[[t2]])
>>> 
>>> 
>>> Thank You!
>>> 
>>> --
>>> Guillaume Massé
>>> [Gee-OHM]
>>> (马赛卫)
>> 
>> 

Reply via email to