adriangb commented on PR #19538:
URL: https://github.com/apache/datafusion/pull/19538#issuecomment-3806270548

   @alamb I've renamed the terminology to:
   
   ```rust
   pub enum ExpressionPlacement {
       /// Argument is a literal constant value or an expression that can be
       /// evaluated to a constant at planning time.
       Literal,
       /// Argument is a simple column reference.
       Column,
       /// Argument is a complex expression that can be safely placed at leaf 
nodes.
       /// For example, if `get_field(struct_col, 'field_name')` is implemented 
as a
       /// leaf-pushable expression, then it would return this variant.
       /// Then `other_leaf_function(get_field(...), 42)` could also be 
classified as
       /// leaf-pushable using the knowledge that `get_field(...)` is 
leaf-pushable.
       PlaceAtLeafs,
       /// Argument is a complex expression that should be placed at root nodes.
       /// For example, `min(col1 + col2)` is not leaf-pushable because it 
requires per-row computation.
       PlaceAtRoot,
   }
   ```
   
   I.e. `Trival` -> `PlaceAtLeafs` and `NonTrivial` -> `PlaceAtRoot.
   I think this still doesn't capture all of the nuance, e.g. in reality the 
complex expressions may not be optimal to place at the root of the tree if the 
root is wider than some other point in the plan e.g. because of joins. I'm open 
to naming suggestions that don't imply it should be pushed into the root e.g. 
`PlaceAfterRepartition`?
   
   In my mind the two main things to do there are:
   1. Scrutinize this bit of code: 
https://github.com/apache/datafusion/pull/19538/changes#r2721295294
   2. Try to fit in the big picture of the logical optimizer rule part of this. 
In particular, I wonder if we don't need the `FilterExec` / `SortExec` parts of 
this PR if that can be handled by a logical optimizer rule. The part we for 
sure need is the `RepartitionExec` part since `RepartitionExec` only exists in 
the physical plan layer and the transformation into a physical plan converts 
`Projection -> Scan` into `ProjectionExec -> RepartitionExec -> 
DataSourceExec`. I do want to explore some alternatives though: can we simply 
do this pushdown before introducing RepartitionExec? That is, if we have a 
logical optimizer rule that places a `Projection` right next to scans, can we 
then run the physical projection pushdown rule before the physical plan is 
re-arranged too much so that the `DataSourceExec` already owns the projections?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to