alamb opened a new issue, #4089:
URL: https://github.com/apache/arrow-datafusion/issues/4089

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   You can write a query like
   
   ```sql
   select ... from foo where id in (4)
   ```
   
   And that sql is oftentimes  made by tools that handle some number of ids;
   
   We have a specialized InList implementation (e.g. see 
https://github.com/apache/arrow-datafusion/pull/4057) but for single values it 
is still faster to use a standard equality predicate
   
   **Describe the solution you'd like**
   
   As mentioned by @jackwener and @Dandandan in 
https://github.com/apache/arrow-datafusion/pull/4057#discussion_ we should 
rewrite inlist with a few elements.
   
   We should definitely simplify `<left> IN (<expr>)` to `<left> = <expr>` as 
that will be better in all cases. 
   
   **Describe alternatives you've considered**
   
   We could potentially also rewrite `<left> IN (<expr>, <expr2>, .. <exprN>)` 
to `<left> = <expr> OR <left> = <expr2> OR .. <left> = <exprN>`
   
   However, at some point the InList expression is faster to evaluate, and that 
break even point depends on the cost to evaluate `<left>`  . Thus I suggest we 
only rewrite for single value IN lists
   
   
   
   **Additional context**
   This is a good first issue because there are several examples of the code 
and tests to follow
   
   You can find simplify rules here: 
https://github.com/apache/arrow-datafusion/blob/10e64dc013ba210ab1f6c2a3c02c66aef4a0e802/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L329-L339
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to