[ https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654859#comment-16654859 ]
pengzhiwei edited comment on CALCITE-2630 at 10/18/18 9:01 AM: --------------------------------------------------------------- _However, when it comes to IN expressions with subqueries I am not sure if it will be beneficial. In particular_ [~zabetak] , the "in subquery" is not included in this plan,as there can be only one sub-query in the "IN" expression and also cannot mix with other expression.The "in subquery" is more likes a semi join.But for "in expressions",it more likes a function but not a "join". _Moreover, note that the existing runtime does not provide an implementation for the IN operator_ We can implement a InExpression for calcite runtime.And also other sql-engine which build on calcite like flink can implement their own "InExpression" as well.The current translation for "IN expressions" to "join" is much harder to implement for other sql-engine. was (Author: pzw2018): [~zabetak] , the "in subquery" is not included in this plan,as there can be only one sub-query in the "IN" expression and also cannot mix with other expression.The "in subquery" is more likes a semi join.But for "in expressions",it more likes a function but not a "join". > Convert SqlInOperator to In-Expression > -------------------------------------- > > Key: CALCITE-2630 > URL: https://issues.apache.org/jira/browse/CALCITE-2630 > Project: Calcite > Issue Type: Improvement > Components: core > Affects Versions: 1.17.0 > Reporter: pengzhiwei > Assignee: Julian Hyde > Priority: Major > > Currently Calcite translate "IN" to "OR" expression when the count of IN's > operands less than "inSubQueryThreshold" or to "Join" when the operands > count greater than "inSubQueryThreshold" to get better performance. > However this translation to "JOIN" is so complex. Especially when the "IN" > expression located in the "select" or "join on condition". > For example: > {code:java} > select case when deptno in (1,2) then 0 else 1 end from emp > {code} > the logical plan generated as follow: > {code:java} > LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), > true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)]) > LogicalJoin(condition=[=($11, $12)], joinType=[left]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], > DEPTNO0=[$7]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) > LogicalProject(ROW_VALUE=[$0], $f1=[true]) > LogicalValues(tuples=[[{ 1 }, { 2 }]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(ROW_VALUE=[$0], $f1=[true]) > LogicalValues(tuples=[[{ 1 }, { 2 }]]) > {code} > The generated logical plan is so complex for such a simple sql! > I think we can treat "IN" as a function like "plus" and "minus".So there is > no translation on "IN" and just keep it as it is.This would be much clear in > the logical plan! > In the execute stage,We can provide a "InExpression": > {code:java} > InExpression(left,condition0,condition1,...) {code} > We can put all the constant conditions to a "Set".In that way,the > computational complexity can reduce from O(n)to O(1). > It would be much clear and have a good performance. > PS: "In sub-query" is not included in our talk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)