alamb commented on issue #19550:
URL: https://github.com/apache/datafusion/issues/19550#issuecomment-3711615544
I think this also has some elements of a cost decision too
In your example above, if you take it to the logical conclusion you might go
from
```
Projection: lower(get_field(t.user, Utf8("email")))
Filter: get_field(t.user, Utf8("address")) ILIKE Utf8("%nyc%")
SubqueryAlias: t
TableScan: test.parquet
```
To this (if the expression can't be pushed down)
```
Filter: t ILIKE Utf8("%nyc%")
Projection: get_field(t.user, Utf8("address")) as t,
lower(get_field(t.user, Utf8("email")))
SubqueryAlias: t
TableScan: test.parquet
```
And this plan could potentially be much worse as it will now call `lower()`
on *all* rows, rather than just the rows that pass the predicate, and discard
most of the results.
I don't quite follow what DuckDB does but from a really high level, it
almost seems like they are special casing struct field accesses, which makes
sense as those expressions in particular, will always be faster (and there is
no costing / tradeoff involved)
What if we made a special optimizer pass that only pushed down struct
accesses (maybe "Expr::is_trivial" or something else equivalent)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]