westonpace commented on PR #34834:
URL: https://github.com/apache/arrow/pull/34834#issuecomment-1624731983

   > And substrait doesn't have an "is_in" like function? (or are there plans 
for that?)
   >  (this conversion seems unfortunate, as "is_in" can be more efficient than 
the equivalent or-list)
   
   It's an interesting point.  We have things like this outside of expressions 
too.  For example, the "join" node doesn't distinguish between an equality join 
(which can be done efficiently with a hashmap) and a non-equality join (which 
cannot).  In that case we actually have both representations.  The one people 
typically use is the "JoinRel" which is a logical operator and thus allowed to 
be more generic without concern for efficiency and the other one is the 
"HashJoinRel" which is more specific / physical, but typically not created by 
producers (instead planners or optimizers convert from one to the other).
   
   I think this is interesting because "is_in" vs. "singular-or-list" is 
basically a logical vs physical distinction for expressions which I don't think 
I've really considered before, but I agree with you its valid.
   
   In any case, it will be easy enough in Acero's converter, to recognize the 
cases that can collapse to `is_in` and use it where appropriate.  I've created 
https://github.com/apache/arrow/issues/36535 to track this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to