alamb commented on PR #10221:
URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077771173

   > I updated the tests, but I'm still unsure if it's okay to ignore this 
change in behavior. It seems like this could be a regression in other cases 
where we might be converting things to dictionaries for no reason.
   
   I think the reason to convert to a dictonary is that the other side of the 
comparsion is already a dictionary, which thus avoids convering *both* 
arguments (though I may be missing something)
   
   Maybe @viirya  has some thoughts given he worked on coercion as part of 
https://github.com/apache/datafusion/pull/9459 recently
   
   > It would be nice to have some issue tracking this, or maybe there is one 
already.
   
   Perhaps https://github.com/apache/datafusion/issues/5928 ? 
   
   There appear to be a bunch of open tickets about coercion 
https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+coercion
   
   > A possibly easy change to make in the short-term is to refactor away from 
using `comparison_coercion` for non-comparisons and instead use coercion logic 
specifically tailored to those expressions.
   
   This seems reasonable to me (aka have special rules for `coerce`). Other 
functions like `BETWEEN` I think can be thought of as syntactic sugar for 
comparisons (namely `x > low AND x < high`) 
   
   > I just feels like we're playing Jenga with type coercion logic right now. 
The coercion logic should prioritize avoiding expensive casts, I think.
   
   The key to ensuring we don't break existing code is to rely on the tests. 
   
   I agree the type coercion logic could be improved -- specifically I think it 
needs to have some encapsulation rather than being spread out through a bunch 
of free functions.
   
   Is this something you are interested in helping out with? I think the first 
thing to do would be to try and explain how the current logic works (and in 
that process we will likely uncover ways to make it better). Then,  would 
improve the code the structure to encapsulate the logic (into structs / enums 
perhaps -- like I did recently in 
https://github.com/apache/datafusion/pull/10216).  Once the logic is more 
encapsulated, then I think we'll be able to reason about it and feel good about 
how it fits into the overall picture
   
   I think the difference in casting a scalar integer to a scalar dictionary is 
neglible. The difference casting a column to a different type is likely 
subtantial (though casting string --> dictionary doesn't require any data 
copying)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to