alamb commented on issue #9111: URL: https://github.com/apache/arrow-datafusion/issues/9111#issuecomment-1924653589
I have a two questions: 1. Do you know of any examples of "algorithmic limitations" (e.g. plans where unnecessary columns are carried through)? 2. How does this compare to how pushdown is done during LogicalPlanning (e.g. https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/push_down_projection.rs). Are you planning changes / extension to that logic too? One of the reason to do pushdown at that level is that it is less complicated (e.g. output indexes aren't used) The idea of supporting ProjectionPushdown for userdefined plans sounds a good idea to me 👍 . If that is indeed the usecase adding a test showing it doesn't work today and works after your changes is 🆗 > I will plan to open a PR for suggestions, especially on how to update the ExecutionPlan API to get rid of if else structure of all plans. I assume you are referring to this code: https://github.com/apache/arrow-datafusion/blob/7641a3228156aab0e48c4bab5a6834b44f722d89/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L105-L143 Maybe you could add an API like this: ```rust trait ExecutionPlan { ... /// Return a copy of this plan that only produces the specified outputs, if possible, in the specified order. /// the projection is a BTreeMap as it is unique and in increasing order (as some nodes like HashAggregateExec /// can't reorder their outputs /// /// For a plan plans such as `ProjectionExec`, projecting a subset of columns will reduce the /// expression list. /// /// for some plans such as `HashAggregateExec`, projection may not be possible (e.g. it is not possible to remove /// group columns from the output) /// /// By default, returns Ok(None) fn try_project(projection: BTreeSet<usize>) -> Result<Option<Arc<dyn Self>> { Ok(None) } ... ``` 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
