[GitHub] [arrow-datafusion] jonmmease commented on issue #5034: Error during physical planning when joining to subquery with count distinct aggregate

via GitHub Tue, 24 Jan 2023 08:07:29 -0800


jonmmease commented on issue #5034:
URL: 
https://github.com/apache/arrow-datafusion/issues/5034#issuecomment-1402199586


   I think this is very related to 
https://github.com/apache/arrow-datafusion/pull/4050 by @andygrove 
   
   Here is the optimized logical plan that's generated (with 
`SingleDistinctToGroupBy` in place) for this issue's query:
   ```
   Projection: tbl.colA, q1.colB, q1.colC
     Inner Join: Using tbl.colB = q1.colB
       TableScan: tbl projection=[colA, colB]
       SubqueryAlias: q1
         Projection: tbl.colB, COUNT(DISTINCT tbl.colA) AS colC
           Projection: group_alias_0 AS tbl.colB, COUNT(alias1) AS 
COUNT(DISTINCT tbl.colA)
             Aggregate: groupBy=[[group_alias_0]], aggr=[[COUNT(alias1)]]
               Aggregate: groupBy=[[tbl.colB AS group_alias_0, tbl.colA AS 
alias1]], aggr=[[]]
                 TableScan: tbl projection=[colA, colB]
   ```
   
   The `group_alias_0 AS tbl.colB` fragment (which is introduced by the 
`SingleDistinctToGroupBy` optimizer rule) creates a new unqualified column 
named "tbl.colB", which isn't the same thing as the original qualified column 
"tbl"."colB".   The join on `tbl.colB = q1.colB` then fails to to match the 
"tbl.colB" column during physical planning.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jonmmease commented on issue #5034: Error during physical planning when joining to subquery with count distinct aggregate

Reply via email to