alamb commented on a change in pull request #8917:
URL: https://github.com/apache/arrow/pull/8917#discussion_r546286050
##########
File path: rust/datafusion/src/datasource/datasource.rs
##########
@@ -48,9 +66,19 @@ pub trait TableProvider {
&self,
projection: &Option<Vec<usize>>,
batch_size: usize,
+ filters: &[Expr],
) -> Result<Arc<dyn ExecutionPlan>>;
/// Returns the table Statistics
/// Statistics should be optional because not all data sources can provide
statistics.
fn statistics(&self) -> Statistics;
+
+ /// Tests whether the table provider can make use of a filter expression
+ /// to optimise data retrieval.
+ fn test_filter_pushdown(
Review comment:
@rdettai -- I think it is reasonable to assume something (I am
purposely being vague here) could be done at logical planning time that could
save execution time later on. While I don't fully follow the scenario you are
describing, I can see how in some cases you won't save much by trying to do
partition pruning at the logical planning level.
However, I can think of scenarios where you might be able to (like deciding
what hosts to send a plan to, for example). You might be right that that should
be done at Physical Planning time.
It is my opinion that DataFusion is moving sufficiently fast at this point
that we are discovering the requirements as we go that having a potentially
more complicated API is an acceptable tradeoff, especially given how well this
one is documented.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]