alamb commented on a change in pull request #8917:
URL: https://github.com/apache/arrow/pull/8917#discussion_r546286050



##########
File path: rust/datafusion/src/datasource/datasource.rs
##########
@@ -48,9 +66,19 @@ pub trait TableProvider {
         &self,
         projection: &Option<Vec<usize>>,
         batch_size: usize,
+        filters: &[Expr],
     ) -> Result<Arc<dyn ExecutionPlan>>;
 
     /// Returns the table Statistics
     /// Statistics should be optional because not all data sources can provide 
statistics.
     fn statistics(&self) -> Statistics;
+
+    /// Tests whether the table provider can make use of a filter expression
+    /// to optimise data retrieval.
+    fn test_filter_pushdown(

Review comment:
       @rdettai  -- I think it is reasonable to assume something (I am 
purposely being vague here) could be done at logical planning time that could 
save execution time later on. While I don't fully follow the scenario you are 
describing, I can see how in some cases you won't save much by trying to do 
partition pruning at the logical planning level.
   
   However, I can think of scenarios where you might be able to (like deciding 
what hosts to send a plan to, for example). You might be right that that should 
be done at Physical Planning time. 
   
   It is my opinion that DataFusion is moving sufficiently fast at this point 
that we are discovering the requirements as we go that having a potentially 
more complicated API is an acceptable tradeoff, especially given how well this 
one is documented. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to