sdf-jkl commented on PR #18789:
URL: https://github.com/apache/datafusion/pull/18789#issuecomment-3564711137

   @alamb @2010YOUY01 Thanks for your reviews, I've implemented most of the 
suggested changes, but there are few nits here and there. I will self-review to 
point them out.
   
   A major question I have about the current logic: should we support preimage 
for `in_list` predicate expressions?
   
   Currently `inlist_simplifier` simplifies `expr IN (A, B, ...) --> (expr = A) 
OR (expr = B) OR (expr = C)`
   if (`list.len()` is 1) or (`list.len()` less than 3 and the left expression 
is a column expression)
   
https://github.com/apache/datafusion/blob/c7b339e0b016ace1d3c337f756eb684ce4bc57b5/datafusion/optimizer/src/simplify_expressions/inlist_simplifier.rs#L47-L55
   
   Originally, I was following the `unwrap_cast` implementation and it's using 
a separate `inlist` simplification logic specifically for cast expressions:
   
https://github.com/apache/datafusion/blob/12cb4cae1ce9020ddfa2f9890f9f4e7c1a43fccb/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1913-L1961
   
   I think that using existing `inlist_simplifier` would be more elegant and 
save space, but it would let other expressions to be simplified like this `expr 
IN (A, B, ...) --> (expr = A) OR (expr = B) OR (expr = C)`
   
   I believe we could improve the logic here:
   ```rust
   if !list.is_empty()
                   && (
                       // For lists with only 1 value we allow more complex 
expressions to be simplified
                       // e.g SUBSTR(c1, 2, 3) IN ('1') -> SUBSTR(c1, 2, 3) = 
'1'
                       // for more than one we avoid repeating this potentially 
expensive
                       // expressions
                       list.len() == 1
                           || list.len() <= THRESHOLD_INLINE_INLIST
                               && expr.try_as_col().is_some()
                           || list.len() <= THRESHOLD_INLINE_INLIST
                               && matches!(**expr, 
Expr::ScalarFunction(ScalarFunction { func, args })) 
                               && func.preimage_support() // by adding a method 
to `ScalarUDFImpl` and `ScalarUDF`
                   )
   ```
   This way, only scalar functions that explicitly support preimage will be 
simplified here.
   
   Please let me know what you think about this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to