[
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656410#comment-16656410
]
pengzhiwei edited comment on CALCITE-2630 at 10/19/18 7:57 AM:
---------------------------------------------------------------
_By adjusting the subQuertThreshold you can avoid the translation to a join
and have always the OR expansion which does not seem much more complicated than
the equivalent IN._
[~zabetak] The problem I meet in our engine is that when I adjust the
_subQuertThreshold_ to a large value .e.g. 1000,the execution time by OR
expression is
intolerable.
And I think that keeping SqlInOperator in RexCall should not affect other
existing simplify rules.We can also add some new simplify rule for IN.
For example,we can convert the or expression "x=constant1 or x= constant2..."
to "x in (constant1,constant2...)" to reduce the computational complexity for
O(n) to O(1).
was (Author: pzw2018):
_By adjusting the subQuertThreshold you can avoid the translation to a join
and have always the OR expansion which does not seem much more complicated than
the equivalent IN._
[~zabetak] The problem I meet in our engine is that when I adjust the
_subQuertThreshold_ to a large value .e.g. 1000,the execution time by OR
expression is
intolerable.
And I think the keep SqlInOperator in RexCall should not affect other existing
simplify rules.We can also add some new simplify rule for IN.
For example,we can convert the or expression "x=constant1 or x= constant2..."
to "x in (constant1,constant2...)" to reduce the computational complexity for
O(n) to O(1).
> Convert SqlInOperator to In-Expression
> --------------------------------------
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: pengzhiwei
> Assignee: pengzhiwei
> Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of IN's
> operands less than "inSubQueryThreshold" or to "Join" when the operands
> count greater than "inSubQueryThreshold" to get better performance.
> However this translation to "JOIN" is so complex. Especially when the "IN"
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13),
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
> DEPTNO0=[$7])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is
> no translation on "IN" and just keep it as it is.This would be much clear in
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
> We can put all the constant conditions to a "Set".In that way,the
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance.
> PS: "In sub-query" is not included in our talk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)