adriangb commented on PR #19538:
URL: https://github.com/apache/datafusion/pull/19538#issuecomment-3806270548
@alamb I've renamed the terminology to:
```rust
pub enum ExpressionPlacement {
/// Argument is a literal constant value or an expression that can be
/// evaluated to a constant at planning time.
Literal,
/// Argument is a simple column reference.
Column,
/// Argument is a complex expression that can be safely placed at leaf
nodes.
/// For example, if `get_field(struct_col, 'field_name')` is implemented
as a
/// leaf-pushable expression, then it would return this variant.
/// Then `other_leaf_function(get_field(...), 42)` could also be
classified as
/// leaf-pushable using the knowledge that `get_field(...)` is
leaf-pushable.
PlaceAtLeafs,
/// Argument is a complex expression that should be placed at root nodes.
/// For example, `min(col1 + col2)` is not leaf-pushable because it
requires per-row computation.
PlaceAtRoot,
}
```
I.e. `Trival` -> `PlaceAtLeafs` and `NonTrivial` -> `PlaceAtRoot.
I think this still doesn't capture all of the nuance, e.g. in reality the
complex expressions may not be optimal to place at the root of the tree if the
root is wider than some other point in the plan e.g. because of joins. I'm open
to naming suggestions that don't imply it should be pushed into the root e.g.
`PlaceAfterRepartition`?
In my mind the two main things to do there are:
1. Scrutinize this bit of code:
https://github.com/apache/datafusion/pull/19538/changes#r2721295294
2. Try to fit in the big picture of the logical optimizer rule part of this.
In particular, I wonder if we don't need the `FilterExec` / `SortExec` parts of
this PR if that can be handled by a logical optimizer rule. The part we for
sure need is the `RepartitionExec` part since `RepartitionExec` only exists in
the physical plan layer and the transformation into a physical plan converts
`Projection -> Scan` into `ProjectionExec -> RepartitionExec ->
DataSourceExec`. I do want to explore some alternatives though: can we simply
do this pushdown before introducing RepartitionExec? That is, if we have a
logical optimizer rule that places a `Projection` right next to scans, can we
then run the physical projection pushdown rule before the physical plan is
re-arranged too much so that the `DataSourceExec` already owns the projections?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]