[
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
pengzhiwei updated CALCITE-2630:
--------------------------------
Description:
Currently Calcite translate "IN" to "OR" expression when the count of IN's
operands less than "inSubQueryThreshold" and to "Join" when the count greater
than "inSubQueryThreshold" to get better performance.
However this translation to "JOIN" is so complex,especially when the "IN"
expression located in the "select" or "join on condition".For example:
{code:java}
select case when deptno in (1,2) then 0 else 1 end from emp
{code}
the logical plan generated as follow:
{code:java}
LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true,
IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
LogicalJoin(condition=[=($11, $12)], joinType=[left])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
DEPTNO0=[$7])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
{code}
The generated logical plan is so complex for such a simple sql!
I think we can treat "IN" as a function like "plus" and "minus".And there is no
translation spending on "IN" just keep it as it is.This would be much clear in
the logical plan!
In the compute stage,We can provide a "InExpression":
{code:java}
InExpression(left,condition0,condition1,...){code}
We can put all the constant conditions to a "Set".In that way,the
computational complexity can reduce from O(n)to O(1).
It would be much clear and have a good performance.And we have implement it in
our streaming-sql system.
was:
Currently Calcite translate "IN" to "OR" expression when the count of IN's
operands less than "inSubQueryThreshold" and to "Join" when the count greater
than "inSubQueryThreshold" to get better performance.
However this translation to "JOIN" is so complex,especially when the "IN"
expression located in the "select" or "join on condition".For example:
{code:java}
select case when deptno in (1,2) then 0 else 1 end from emp
{code}
the logical plan generated as follow:
{code:java}
LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true,
IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
LogicalJoin(condition=[=($11, $12)], joinType=[left])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
DEPTNO0=[$7])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
{code}
The generated logical plan is so complex for such a simple sql!
I think we can treat "IN" as a function like "plus" and "minus".And there is no
translation spending on "IN" just keep it as it is.This would be much clear in
the logical plan!
In the compute stage,We can provide a "InExpression":
{code:java}
InExpression(left,condition0,condition1,...){code}
We can put all the constant conditions to a "Set".In that way,the
computational complexity can reduce from O(n) to O(1).
It would be much clear and have a good performance.And we have implement it in
our streaming-sql system.
> Convert SqlInOperator to a IN-Expression
> ----------------------------------------
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: pengzhiwei
> Assignee: Julian Hyde
> Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of IN's
> operands less than "inSubQueryThreshold" and to "Join" when the count greater
> than "inSubQueryThreshold" to get better performance.
> However this translation to "JOIN" is so complex,especially when the "IN"
> expression located in the "select" or "join on condition".For example:
>
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
>
>
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13),
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
> DEPTNO0=[$7])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
>
> I think we can treat "IN" as a function like "plus" and "minus".And there is
> no translation spending on "IN" just keep it as it is.This would be much
> clear in the logical plan!
> In the compute stage,We can provide a "InExpression":
>
> {code:java}
> InExpression(left,condition0,condition1,...){code}
>
> We can put all the constant conditions to a "Set".In that way,the
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance.And we have implement it
> in our streaming-sql system.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)