xudong963 commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2670997727


##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -337,4 +362,37 @@ fn add_global_limit(
     Arc::new(GlobalLimitExec::new(pushdown_plan, skip, fetch)) as _
 }
 
+/// Helper function to handle DataSourceExec preserve_order setting
+fn ensure_preserve_order_if_needed(
+    plan: Arc<dyn ExecutionPlan>,
+    order_sensitive: bool,
+) -> Arc<dyn ExecutionPlan> {
+    if !order_sensitive {
+        return plan;
+    }
+
+    let Some(data_source_exec) = 
plan.as_any().downcast_ref::<DataSourceExec>() else {
+        return plan;
+    };

Review Comment:
   We could set the preserve_order flag to true when an order was pushed down 
into FileScanConfig.
   
   But here, I'm trying to solve the case described in 
https://github.com/apache/datafusion/pull/18868#issuecomment-3669612233
   
   > However, a complication arises: if the data distribution and physical 
ordering of table t already satisfy ORDER BY a, the EnforceSorting phase in the 
Physical Optimizer will remove the Sort node. Subsequently, the LIMIT is pushed 
down to the DataSource during the LimitPushdown phase.
   
   > In this scenario, we cannot rely solely on the presence of the LIMIT at 
the Parquet level to decide whether to prune. If we prune based on a limit that 
was originally associated with a removed Sort, we might violate the required 
global ordering.
   
   > Proposed Solution: To address this fundamentally, when a Sort node is 
removed because the distribution already matches the requirements, we should 
mark the resulting Limit node as 
["Order-Sensitive."](https://github.com/apache/datafusion/pull/18868/commits/46cf80b8b4ce49e96548b873369c66835629fbb2#diff-445a93b16e352019a131011d0574c5bc2e1ec32b830c0052876bdba76855e5f5R586-R593)
 During the subsequent LimitPushdown, we can [detect this flag and set 
](https://github.com/apache/datafusion/pull/18868/commits/46cf80b8b4ce49e96548b873369c66835629fbb2#diff-baa20d781df314d37cdfd47e897af90ae4aabf1aa2b892a90bf0b3bdd21e020eR253-R279)the
 preserve_order attribute to true in the ParquetOpener. This ensures that Limit 
Pruning is bypassed when the global order must be maintained.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to