alamb commented on issue #9111:
URL: 
https://github.com/apache/arrow-datafusion/issues/9111#issuecomment-1924653589

   I have a two questions:
   1. Do you know of any examples of "algorithmic limitations"  (e.g. plans 
where unnecessary columns are carried through)?
   2. How does this compare to how pushdown is done during LogicalPlanning 
(e.g. 
   
https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/push_down_projection.rs).
 Are you planning changes / extension to that logic too? One of the reason to 
do pushdown at that level is that it is less complicated (e.g. output indexes 
aren't used)
   
   The idea of supporting ProjectionPushdown for userdefined plans sounds a 
good idea to  me 👍 . If that is indeed the usecase adding a test showing it 
doesn't work today and works after your changes is 🆗 
   
   >  I will plan to open a PR for suggestions, especially on how to update the 
ExecutionPlan API to get rid of if else structure of all plans.
   
   I assume you are referring to this code:
   
https://github.com/apache/arrow-datafusion/blob/7641a3228156aab0e48c4bab5a6834b44f722d89/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L105-L143
   
   Maybe you could add an API like this:
   ```rust
   trait ExecutionPlan {
   ...
     /// Return a copy of this plan that only produces the specified outputs, 
if possible, in the specified order.
     /// the projection is a BTreeMap as it is unique and in increasing order 
(as some nodes like HashAggregateExec
     /// can't reorder their outputs 
     /// 
     /// For a plan plans such as `ProjectionExec`, projecting a subset of 
columns will reduce the
     /// expression list. 
     ///
     /// for some plans such as `HashAggregateExec`, projection may not be 
possible (e.g. it is not possible to remove 
     /// group columns from the output)
     /// 
     /// By default, returns Ok(None)
     fn try_project(projection: BTreeSet<usize>) -> Result<Option<Arc<dyn 
Self>>  { Ok(None) }
   ...
   ```
   
   🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to