jonmmease commented on issue #5034: URL: https://github.com/apache/arrow-datafusion/issues/5034#issuecomment-1402199586
I think this is very related to https://github.com/apache/arrow-datafusion/pull/4050 by @andygrove Here is the optimized logical plan that's generated (with `SingleDistinctToGroupBy` in place) for this issue's query: ``` Projection: tbl.colA, q1.colB, q1.colC Inner Join: Using tbl.colB = q1.colB TableScan: tbl projection=[colA, colB] SubqueryAlias: q1 Projection: tbl.colB, COUNT(DISTINCT tbl.colA) AS colC Projection: group_alias_0 AS tbl.colB, COUNT(alias1) AS COUNT(DISTINCT tbl.colA) Aggregate: groupBy=[[group_alias_0]], aggr=[[COUNT(alias1)]] Aggregate: groupBy=[[tbl.colB AS group_alias_0, tbl.colA AS alias1]], aggr=[[]] TableScan: tbl projection=[colA, colB] ``` The `group_alias_0 AS tbl.colB` fragment (which is introduced by the `SingleDistinctToGroupBy` optimizer rule) creates a new unqualified column named "tbl.colB", which isn't the same thing as the original qualified column "tbl"."colB". The join on `tbl.colB = q1.colB` then fails to to match the "tbl.colB" column during physical planning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
