[
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655558#comment-16655558
]
Julian Hyde edited comment on CALCITE-2630 at 10/18/18 4:45 PM:
----------------------------------------------------------------
I don't very much like the idea of adding another Rex operator. A new operator
-- especially a boolean one, which needs to participate in all kinds of
transformations, simplifications, and 3-valued logic -- is a lot of work.
If we're talking about the case "column in (constant1, constant2, ...,
constantN)" then we already have two ways:
* First, "column = constant1 OR column = constant2 ... OR column = constantN"
(and note that OR is an n-ary operator in Rex land, so there's one call to OR
with N arguments).
* Second, "RexSubQuery(SqlStdOperatorTable.IN, columnRef, LogicalValues(...))".
The latter form is a hybrid scalar/relational (Rel inside Rex is unusual) but
we could make it work. With both forms, RelToSql could recognize the form and
translate it to an IN SqlCall, and if you have another code generator, you
could generate code accordingly.
To summarize: you can have your optimized physical IN operator in your
generated code, but it doesn't need to exist as an operator in Rex land.
was (Author: julianhyde):
I don't very much like the idea of adding another Rex operator. A new operator
-- especially a boolean one, which needs to participate in all kinds of
transformations, simplifications, and 3-valued logic -- is a lot of work.
If we're talking about the case "column in (constant1, constant2, ...,
constantN)" then we already have two ways:
* First, "column = constant1 OR column = constant2 ... OR column = constantN"
(and note that OR is an n-ary operator in Rex land, so there's one call to OR
with N arguments).
* Second, "RexSubQuery(SqlStdOperatorTable.IN, columnRef, LogicalValues(...))".
The latter form is a hybrid scalar/relational (Rel inside Rex is unusual) but
we could make it work. With both forms, RelToSql could recognize the form and
translate it to an IN SqlCall, and if you have another code generator, you
could generate code accordingly.
> Convert SqlInOperator to In-Expression
> --------------------------------------
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: pengzhiwei
> Assignee: Julian Hyde
> Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of IN's
> operands less than "inSubQueryThreshold" or to "Join" when the operands
> count greater than "inSubQueryThreshold" to get better performance.
> However this translation to "JOIN" is so complex. Especially when the "IN"
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13),
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
> DEPTNO0=[$7])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is
> no translation on "IN" and just keep it as it is.This would be much clear in
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
> We can put all the constant conditions to a "Set".In that way,the
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance.
> PS: "In sub-query" is not included in our talk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)