jonahgao commented on PR #14180: URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2613997693
> @jonahgao `select_to_plan` only works with SQL API, but sometimes people use `DataFrame` API directly, where `test_distinct_sort_by_unprojected` is this case, so only check in `select_to_plan` not works for `DataFrame` API. My plan is for the DataFrame to keep using `add_missing_columns`, and for the SQL API to reuse the processing logic of `HAVING`. This can make the planning of SQL `ORDER BY` faster and more accurate. SQL has well-known specs for handling `ORDER BY`. For example, `ORDER BY` can only reference columns from the SELECT list and the table expression, and there are limitations imposed by the GROUP BY clause. I haven’t found similar specs for sorting DataFrames, and I think they are different concepts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
