[GitHub] [arrow-datafusion] Jefffrey opened a new issue, #5251: SQL GROUP BY doesn't do ambiguity check

via GitHub Sat, 11 Feb 2023 03:31:18 -0800


Jefffrey opened a new issue, #5251:
URL: https://github.com/apache/arrow-datafusion/issues/5251


   **Describe the bug**
   A clear and concise description of what the bug is.
   
   In SQL when GROUP BY an ambiguous column, it doesn't return an error.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ```sql
   ❯ select * from test1 t1 join test2 t2 using (a);
   +---+---+---+---+---+
   | a | b | c | b | c |
   +---+---+---+---+---+
   | 1 | 2 | 3 | 2 | 3 |
   | 4 | 5 | 6 | 5 | 6 |
   +---+---+---+---+---+
   2 rows in set. Query took 0.013 seconds.
   ❯ select max(a) from test1 t1 join test2 t2 using (a) group by c;
   +-----------+
   | MAX(t1.a) |
   +-----------+
   | 1         |
   | 4         |
   +-----------+
   2 rows in set. Query took 0.014 seconds.
   ❯ explain select max(a) from test1 t1 join test2 t2 using (a) group by c;
   
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                  |
   
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: MAX(t1.a)                                      
                                                                                
                  |
   |               |   Aggregate: groupBy=[[t1.c]], aggr=[[MAX(t1.a)]]          
                                                                                
                  |
   |               |     Inner Join: Using t1.a = t2.a                          
                                                                                
                  |
   |               |       SubqueryAlias: t1                                    
                                                                                
                  |
   |               |         TableScan: test1 projection=[a, c]                 
                                                                                
                  |
   |               |       SubqueryAlias: t2                                    
                                                                                
                  |
   |               |         TableScan: test2 projection=[a]                    
                                                                                
                  |
   | physical_plan | ProjectionExec: expr=[MAX(t1.a)@1 as MAX(t1.a)]            
                                                                                
                  |
   |               |   AggregateExec: mode=FinalPartitioned, gby=[c@0 as c], 
aggr=[MAX(t1.a)]                                                                
                     |
   |               |     CoalesceBatchesExec: target_batch_size=8192            
                                                                                
                  |
   |               |       RepartitionExec: partitioning=Hash([Column { name: 
"c", index: 0 }], 12), input_partitions=12                                      
                    |
   |               |         AggregateExec: mode=Partial, gby=[c@1 as c], 
aggr=[MAX(t1.a)]                                                                
                        |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                                                                                
                  |
   |               |             HashJoinExec: mode=Partitioned, 
join_type=Inner, on=[(Column { name: "a", index: 0 }, Column { name: "a", 
index: 0 })]                           |
   |               |               CoalesceBatchesExec: target_batch_size=8192  
                                                                                
                  |
   |               |                 RepartitionExec: partitioning=Hash([Column 
{ name: "a", index: 0 }], 12), input_partitions=12                              
                  |
   |               |                   RepartitionExec: 
partitioning=RoundRobinBatch(12), input_partitions=1                            
                                          |
   |               |                     CsvExec: files={1 group: 
[[home/jeffrey/Code/arrow-datafusion/datafusion-cli/test.csv]]}, 
has_header=true, limit=None, projection=[a, c] |
   |               |               CoalesceBatchesExec: target_batch_size=8192  
                                                                                
                  |
   |               |                 RepartitionExec: partitioning=Hash([Column 
{ name: "a", index: 0 }], 12), input_partitions=12                              
                  |
   |               |                   RepartitionExec: 
partitioning=RoundRobinBatch(12), input_partitions=1                            
                                          |
   |               |                     CsvExec: files={1 group: 
[[home/jeffrey/Code/arrow-datafusion/datafusion-cli/test.csv]]}, 
has_header=true, limit=None, projection=[a]    |
   |               |                                                            
                                                                                
                  |
   
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.007 seconds.
   ```
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   Should return error
   
   **Additional context**
   Add any other context about the problem here.
   
   The same query on latest Postgres raises an error about the ambiguity
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Jefffrey opened a new issue, #5251: SQL GROUP BY doesn't do ambiguity check

Reply via email to