berkaysynnada commented on issue #15886:
URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2839131363

   > I have to say it was very much unexpected. As a sanity check, I compared 
to Postgres which does not remove the sorting operation. The Postgres docs say 
that CTEs "effectively serve as temporary tables that can be referenced from 
the FROM list" (https://www.postgresql.org/docs/current/sql-select.html), which 
I would read as to imply that they are not views. There is no documentation 
under the `ORDER BY` clause that states its applicability to CTEs (or views).
   > 
   > I think optimizations that change the semantics of the query, legal 
transformation or not by the SQL standard, should be explicitly opt-in (and 
would still classify this issue as a bug)
   
   If I approach this case practically, when "order by" clauses are given in 
subqueries: these are converted into SortExecs at somewhere in the plan. 
However, in enforce_sorting, we don't track ordering requirements through 
SortExecs directly (otherwise, we wouldn't be able to eliminate truly necessary 
SortExecs). Instead, we track the requirement by inserting 
OutputRequirementExec at the top of the plan, which corresponds to the global 
ordering - that is the ordering expected when an explicit ORDER BY is given in 
the outermost query.
   
   If we decide to introduce a new config for this setting, we first need to 
improve the optimizer phase. Specifically, OutputRequirementExecs could be 
inserted into intermediate nodes in the plan, meaning the subplan beneath must 
guarantee  to bring the required ordering at that point.
   So, TLDR, unless we adapt the current enforce_sorting rule accordingly, we 
risk losing the ability to eliminate unnecessary SortExecs in some cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to