[PR] GROUP-BY prioritizes input columns in case of ambiguity [arrow-datafusion]

via GitHub Wed, 14 Feb 2024 08:57:56 -0800


jonahgao opened a new pull request, #9228:
URL: https://github.com/apache/arrow-datafusion/pull/9228


   ## Which issue does this PR close?
   Closes #9162.
   
   ## Rationale for this change
   
   
   When a column referenced by group-by exists both in the select list and the 
input, the one from the input should be given priority. In issue 9162, there 
are two references with the same name: one is an unqualified "t.a," and the 
other is a qualified t.a.
   
   This is the practice of many databases, including PostgreSQL, Oracle, MySQL, 
Duckdb, etc. 
   In the PostgreSQL documentation, there is an 
[explanation](https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY)
 about it.
   > An expression used inside a grouping_element can be an input column name, 
or the name or ordinal number of an output column (SELECT list item), or an 
arbitrary expression formed from input-column values. In case of ambiguity, a 
GROUP BY name will be interpreted as an **input-column** name rather than an 
output column name.
   
   ## What changes are included in this PR?
   Prioritize searching the schema of the base plan when generating GROUP BY 
expressions.
   
   
   ## Are these changes tested?
   Yes
   
   ## Are there any user-facing changes?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] GROUP-BY prioritizes input columns in case of ambiguity [arrow-datafusion]

Reply via email to