adriangb commented on PR #19365:
URL: https://github.com/apache/datafusion/pull/19365#issuecomment-3666301214

   We discussed in the meeting today.
   The conclusion was basically that the heuristic that is currently in place 
is trying to answer the question "is this expression more or less expensive to 
evaluate than it's constituent parts".
   For example, if the projection *adds* columns `a,b,a+b` that is true, if it 
removes columns `a,b` instead of `a,b,c` it's not.
   However this is broken for things like struct field access or parquet 
variant field access where the expression `large_struct.small_int_field` might 
be considerably cheaper to evaluate than it's constituent parts (i.e. reading a 
single field from a struct is cheaper than reading the whole thing and then 
slicing it).
   The most promising approach we discussed was adding a method to PhysicalExpr 
to let the expression determine if it is cheaper or more expensive than it's 
constituent parts.
   There are other options, I'll open an issue to discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to