Dandandan commented on issue #488:
URL: 
https://github.com/apache/arrow-datafusion/issues/488#issuecomment-857405281


   Hey @msathis that would be great.
   
   Effectively it means rewriting queries from:
   
   ```
   SELECT a, b
   FROM 
   x
   WHERE a in (select b from t)
   ``` 
   
   Could be written as (minus SQL syntax)
   
   ```
   SELECT a, b
   FROM
   x
   SEMI JOIN t ON a=b
   ```
   
   So the work will be
   
   * adding `IN` as option to the `Expr` enum and adding it to the 
`sql/planner`.
   * extracting applicable `IN` expression and transforming it to (left and 
right) columns
   * converting it to a semi join (a join with `JoinType::Semi`) either 
directly in the planner, and/or add a optimization rule (e.g. translating a 
cross join to a semi join). the first would be fine for now.
   
   I think we can return an error in case the logical plan still contains a 
`IN` in a expression somewhere.
   
   One complication I saw is that adding a `LogicalPlan` to the `Expr` (for 
encoding `IN`) is not trivial, because `Expr` has some derived `Eq` etc. which 
the logical plan does not have.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to