adriangb commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2672284226


##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -337,4 +362,37 @@ fn add_global_limit(
     Arc::new(GlobalLimitExec::new(pushdown_plan, skip, fetch)) as _
 }
 
+/// Helper function to handle DataSourceExec preserve_order setting
+fn ensure_preserve_order_if_needed(
+    plan: Arc<dyn ExecutionPlan>,
+    order_sensitive: bool,
+) -> Arc<dyn ExecutionPlan> {
+    if !order_sensitive {
+        return plan;
+    }
+
+    let Some(data_source_exec) = 
plan.as_any().downcast_ref::<DataSourceExec>() else {
+        return plan;
+    };

Review Comment:
   I see yeah. I wonder if with the new sort pushdown rule we can tweak this?
   
   My thought is that the scan can have a sort order that was discovered or 
user provided but not promise to upstream nodes that it's going to output 
sorted results *until* the upstream nodes ask it to do so. At that point it 
would set this internal flag so when we get a limit we can remember that we 
promised our parent nodes we'd be outputting sorted results.
   
   Put another way, at the same time that a file scan node sets it's 
equivalence properties to output sorted results it should set this flag. So 
maybe all we need to do is in `ListingTable` set this flag if we also give the 
scan node a specified ordering? Then it gets set one of two ways:
   1. In the listing step, if we discover a sort order from Parquet metadata or 
if the user provided a sort order.
   2. Via the new sort pushdown rule.
   So either way, if the scan node is outputting sorted data the flag is set?
   
   I just dislike this sort of downcast matching in a generic engine like 
DataFusion. It means that custom scan nodes can't benefit from this feature, 
etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to