xiedeyantu opened a new issue, #21361: URL: https://github.com/apache/datafusion/issues/21361
### Is your feature request related to a problem or challenge? DataFusion currently propagates functional dependencies through aggregates and joins, but it does not use them to simplify `ORDER BY` clauses. As a result, queries like: ``` SELECT deptno, SUM(sal) AS total_sal FROM emp GROUP BY deptno ORDER BY deptno, total_sal ``` still keep both sort keys in the physical plan, even though `deptno` -> `total_sal` already holds after the aggregation. This leads to extra sort work, unnecessary comparison cost, and avoidable memory usage. The same problem appears in cases where a later sort key is functionally determined by earlier keys. Because `Sort` is order-sensitive, only suffix keys that are provably redundant should be removed, and the current optimizer does not do that. ### Describe the solution you'd like I would like DataFusion to prune functionally redundant ORDER BY expressions during logical optimization. For example, when the input schema or an intermediate plan establishes that an earlier ordering key functionally determines a later one, the optimizer should drop the redundant suffix key while preserving sort semantics. In the example above, ORDER BY deptno, total_sal should be simplified to ORDER BY deptno. This should be implemented in a way that respects sort order, so only trailing expressions that are fully determined by the already-kept prefix may be removed. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
