Re: [PR] fix: add missing columns into list directly [datafusion]

via GitHub Sat, 25 Jan 2025 07:14:57 -0800


jonahgao commented on PR #14180:
URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2613997693


   > @jonahgao `select_to_plan` only works with SQL API, but sometimes people 
use `DataFrame` API directly, where `test_distinct_sort_by_unprojected` is this 
case, so only check in `select_to_plan` not works for `DataFrame` API.
   
   My plan is for the DataFrame to keep using `add_missing_columns`, and for 
the SQL API to reuse the processing logic of `HAVING`. This can make the 
planning of SQL `ORDER BY` faster and more accurate. 
   
   SQL has well-known specs for handling `ORDER BY`.  For example, `ORDER BY` 
can only reference columns from the SELECT list and the table expression, and 
there are limitations imposed by the GROUP BY clause.  I haven’t found similar 
specs for sorting DataFrames, and I think they are different concepts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: add missing columns into list directly [datafusion]

Reply via email to