[
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656363#comment-16656363
]
Stamatis Zampetakis commented on CALCITE-2630:
----------------------------------------------
{quote}
I don't very much like the idea of adding another Rex operator.
{quote}
[~julianhyde] : I suppose you mean not creating a RexCall with SqlInOperator
but this is possible anyways if you pass from the RelBuilder and RexBuilder.
{quote}
I think it is much clear and good performance than translate "IN" to a complex
"Join".
{quote}
[~pzw2018] : By adjusting the subQuertThreshold you can avoid the translation
to a join and have always the OR expansion which does not seem much more
complicated than the equivalent IN.
If you really want to have the IN in the RexCall, then the discussion so far in
the dev list is leaning towards creating appropriate rules.
> Convert SqlInOperator to In-Expression
> --------------------------------------
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: pengzhiwei
> Assignee: pengzhiwei
> Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of IN's
> operands less than "inSubQueryThreshold" or to "Join" when the operands
> count greater than "inSubQueryThreshold" to get better performance.
> However this translation to "JOIN" is so complex. Especially when the "IN"
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13),
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
> DEPTNO0=[$7])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is
> no translation on "IN" and just keep it as it is.This would be much clear in
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
> We can put all the constant conditions to a "Set".In that way,the
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance.
> PS: "In sub-query" is not included in our talk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)