[
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655243#comment-16655243
]
Stamatis Zampetakis commented on CALCITE-2630:
----------------------------------------------
Sure, I just wanted to be sure that we are not going to break something about
subqueries and semi-joins.
{quote}
We can implement a InExpression for calcite runtime
{quote}
Great!
{quote}
The current translation for "IN expressions" to "join" is much harder to
implement for other sql-engine.
{quote}
That may be true but we shouldn't forget that some systems and rules are
already built and work with the current transformation. Depending on how the
change is going to be introduced it can break existing 3rd party rules and
systems.
Last but not least it is worth checking some previous discussions in the dev
list. The most recent one can be found
[here|https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E].
> Convert SqlInOperator to In-Expression
> --------------------------------------
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: pengzhiwei
> Assignee: Julian Hyde
> Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of IN's
> operands less than "inSubQueryThreshold" or to "Join" when the operands
> count greater than "inSubQueryThreshold" to get better performance.
> However this translation to "JOIN" is so complex. Especially when the "IN"
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13),
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10],
> DEPTNO0=[$7])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
> LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is
> no translation on "IN" and just keep it as it is.This would be much clear in
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
> We can put all the constant conditions to a "Set".In that way,the
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance.
> PS: "In sub-query" is not included in our talk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)