jayshrivastava commented on issue #21207: URL: https://github.com/apache/datafusion/issues/21207#issuecomment-4857069773
👋🏽 Hey folks, I would like to revive this discussion about `ExecutionPlan::apply_expressions` https://github.com/apache/datafusion/pull/22415. In short, it looks like we weren't confident in adding it as a required trait method, especially since it's a bit difficult to implement correctly. The plan was to re-add it when its actually used. There's not many use cases in single node datafusion that I can think of (one [here](https://github.com/apache/datafusion/pull/20009)), but there are several important in datafusion-distributed: - collect dynamic filters from an `ExecutionPlan` executed on a worker and send them back to the coordinator for displaying - collect dynamic filters from `ExecutionPlan` producer nodes to send to remote consumers - call `DynamicFilterPhysicalExpr::update()` on remote `ExecutionPlan` consumer nodes The only way to get `DynamicFilterPhysicalExpr` from `ExecutionPlan` is downcasting from `ExecutionPlan`, which hurts extensibility IMO. It would be better to go through a trait method. ``` - FilterExec::predicate() - HashJoinExec::dynamic_filter_expr() - AggregateExec::dynamic_filter_expr() - SortExec::dynamic_filter_expr() - DataSourceExec::data_source.downcast_ref::FileScanConfig()?.file_source.filter() ``` Since we have `gather_filters_for_pushdown` and `handle_child_pushdown_result`, it could make sense to have something directly related to filters instead of `apply_expressions`? It must be a required trait. ``` /// Behavior is undefined unless all optimizer rules have run. /// /// Returns all filter expressions. This node may store filter expressions as a producer (ex. dynamic filters) or consume them via filter pushdown. /// any case, this method must return these filters. This method does not need to return filters which are not stored because they were pushed down. fn filters() -> Arc<dyn PhysicalExpr> ``` @adriangb @LiaCastaneda @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
