[GitHub] [arrow-datafusion] waynexia commented on issue #2994: single_distinct_to_groupby panic when group by expr is a binaryExpr

GitBox Tue, 02 Aug 2022 02:36:25 -0700


waynexia commented on issue #2994:
URL: 
https://github.com/apache/arrow-datafusion/issues/2994#issuecomment-1202253162


   Thanks for your reproducer @jiacai2050. Here are some investigations:
   
   Looks like the `single_distinct_to_groupby` optimizer doesn't consider the 
original `GROUP BY` contains expression like `a+1` or `b%2`. It copies `GROUP 
BY`s to the inner subquery, which may cause two problems:
   - Schema doesn't match. The schema of subquery will become something like 
`#table.a + Int64(1)`. And thus the outer `GROUP BY` cannot get a column `a` 
from it.
   - Duplicated evaluation. Expressions in `GROUP BY` cannot be calculated 
twice.
   
   To fix this, we need to alias the "original" `GROUP BY` in the subquery and 
refer to that aliased column in outer. E.g.:
   
   - Before optimize
     ```sql
     SELECT id+1, count(distinct(bank_account))
     FROM users
     GROUP BY id+1;
     ```
   - After optimize
     ```sql
     SELECT group_by_alias1 as "id+1", count(bank_account)
     FROM (
       SELECT id+1 as group_by_alias1, bank_account
       FROM users 
       GROUP BY id+1, bank_account
     )
     GROUP BY group_by_alias1;
     ```
   
   I'll submit a fix around this or next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] waynexia commented on issue #2994: single_distinct_to_groupby panic when group by expr is a binaryExpr

Reply via email to