alamb commented on issue #19550:
URL: https://github.com/apache/datafusion/issues/19550#issuecomment-3711615544

   I think this also has some elements of a cost decision too
   
   In your example above, if you take it to the logical conclusion you might go 
from 
   
   ```
   Projection: lower(get_field(t.user, Utf8("email")))
     Filter: get_field(t.user, Utf8("address")) ILIKE Utf8("%nyc%")
       SubqueryAlias: t
         TableScan: test.parquet
   ```
   
   To this (if the expression can't be pushed down)
   ```
   Filter: t ILIKE Utf8("%nyc%")
     Projection: get_field(t.user, Utf8("address")) as t, 
lower(get_field(t.user, Utf8("email")))
       SubqueryAlias: t
         TableScan: test.parquet
   ```
   
   And this plan could potentially be much worse as it will now call `lower()` 
on *all* rows, rather than just the rows that pass the predicate, and discard 
most of the results.
   
   
   I don't quite follow what DuckDB does but from a really high level, it 
almost seems like  they are special casing struct field accesses, which makes 
sense as those expressions in particular, will always be faster (and there is 
no costing / tradeoff involved)
   
   What if we made a special optimizer pass that only pushed down struct 
accesses (maybe "Expr::is_trivial" or something else equivalent)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to