Dandandan commented on issue #488: URL: https://github.com/apache/arrow-datafusion/issues/488#issuecomment-857405281
Hey @msathis that would be great. Effectively it means rewriting queries from: ``` SELECT a, b FROM x WHERE a in (select b from t) ``` Could be written as (minus SQL syntax) ``` SELECT a, b FROM x SEMI JOIN t ON a=b ``` So the work will be * adding `IN` as option to the `Expr` enum and adding it to the `sql/planner`. * extracting applicable `IN` expression and transforming it to (left and right) columns * converting it to a semi join (a join with `JoinType::Semi`) either directly in the planner, and/or add a optimization rule (e.g. translating a cross join to a semi join). the first would be fine for now. I think we can return an error in case the logical plan still contains a `IN` in a expression somewhere. One complication I saw is that adding a `LogicalPlan` to the `Expr` (for encoding `IN`) is not trivial, because `Expr` has some derived `Eq` etc. which the logical plan does not have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
