[jira] [Commented] (CALCITE-500) Ensure EnumerableJoin hashes the smallest input

Vladimir Sitnikov (JIRA) Thu, 13 Jul 2017 11:51:13 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086209#comment-16086209
 ]


Vladimir Sitnikov commented on CALCITE-500:
-------------------------------------------

The below test was performed at
{noformat}
commit 9a5cd27415ea3a1a3955eaee2cb65aa2d69f62cf
Author: Junxian Wu <[email protected]>
Date:   Thu Jul 13 10:57:20 2017 +0200

    [CALCITE-1803] Push Project that follows Aggregate down to Druid (Junxian 
Wu)
{noformat}

I've added the following test to JdbcTest:
{code:java}
  @Test public void testJoin() {
    CalciteAssert.that()
        .with(CalciteAssert.Config.SCOTT)
        .query("select e.deptno, d.DEPTNO from \"scott\".EMP e join 
\"scott\".DEPT d on (e.deptno=d.DEPTNO)")
        .explainMatches("  INCLUDING ALL ATTRIBUTES ", 
checkResultContains("just print explain"))
;
  }
{code}
It prints the following explain plan. Note that it estimates DEPT table to have 
less rows.
{noformat}
PLAN=EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t2], DEPTNO0=[$t0]): 
rowcount = 8.4, cumulative cost = {72.35357744447957 rows, 218.0 cpu, 0.0 io}, 
id = 97
  EnumerableJoin(condition=[=($0, $2)], joinType=[inner]): rowcount = 8.4, 
cumulative cost = {63.95357744447956 rows, 176.0 cpu, 0.0 io}, id = 93
    EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0]): rowcount = 4.0, 
cumulative cost = {8.0 rows, 21.0 cpu, 0.0 io}, id = 99
      EnumerableTableScan(table=[[scott, DEPT]]): rowcount = 4.0, cumulative 
cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 1
    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7]): rowcount = 
14.0, cumulative cost = {28.0 rows, 155.0 cpu, 0.0 io}, id = 101
      EnumerableTableScan(table=[[scott, EMP]]): rowcount = 14.0, cumulative 
cost = {14.0 rows, 15.0 cpu, 0.0 io}, id = 0
{noformat}

So my understanding is DEPT should be a lookup table, and EMP should be scanned 
afterwards.
Let's check that.

I've replaced {{explainMatches(...)}}  with {{.returns("abcd")}}, and added a 
breakpoint to org.apache.calcite.linq4j.EnumerableDefaults#join_ over
{code:java}        final Lookup<TKey, TInner> innerLookup =
            comparer == null
                ? inner.toLookup(innerKeySelector)
                : inner.toLookup(innerKeySelector, comparer);{code}

And it turns out "inner" is a EMP table.

Just in case, the generated code was:
{code:java}/*   1 */ org.apache.calcite.DataContext root;
/*   2 */ 
/*   3 */ public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root0) {
/*   4 */   root = root0;
/*   5 */   final org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("scott"), java.lang.Object[].class, 
"DEPT").asEnumerable();
/*   6 */   final org.apache.calcite.linq4j.AbstractEnumerable left = new 
org.apache.calcite.linq4j.AbstractEnumerable(){
/*   7 */     public org.apache.calcite.linq4j.Enumerator enumerator() {
/*   8 */       return new org.apache.calcite.linq4j.Enumerator(){
/*   9 */           public final org.apache.calcite.linq4j.Enumerator 
inputEnumerator = _inputEnumerable.enumerator();
/*  10 */           public void reset() {
/*  11 */             inputEnumerator.reset();
/*  12 */           }
/*  13 */ 
/*  14 */           public boolean moveNext() {
/*  15 */             return inputEnumerator.moveNext();
/*  16 */           }
/*  17 */ 
/*  18 */           public void close() {
/*  19 */             inputEnumerator.close();
/*  20 */           }
/*  21 */ 
/*  22 */           public Object current() {
/*  23 */             return 
org.apache.calcite.runtime.SqlFunctions.toByte(((Object[]) 
inputEnumerator.current())[0]);
/*  24 */           }
/*  25 */ 
/*  26 */         };
/*  27 */     }
/*  28 */ 
/*  29 */   };
/*  30 */   final org.apache.calcite.linq4j.Enumerable _inputEnumerable0 = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("scott"), java.lang.Object[].class, 
"EMP").asEnumerable();
/*  31 */   final org.apache.calcite.linq4j.AbstractEnumerable right = new 
org.apache.calcite.linq4j.AbstractEnumerable(){
/*  32 */     public org.apache.calcite.linq4j.Enumerator enumerator() {
/*  33 */       return new org.apache.calcite.linq4j.Enumerator(){
/*  34 */           public final org.apache.calcite.linq4j.Enumerator 
inputEnumerator = _inputEnumerable0.enumerator();
/*  35 */           public void reset() {
/*  36 */             inputEnumerator.reset();
/*  37 */           }
/*  38 */ 
/*  39 */           public boolean moveNext() {
/*  40 */             return inputEnumerator.moveNext();
/*  41 */           }
/*  42 */ 
/*  43 */           public void close() {
/*  44 */             inputEnumerator.close();
/*  45 */           }
/*  46 */ 
/*  47 */           public Object current() {
/*  48 */             final Object[] current = (Object[]) 
inputEnumerator.current();
/*  49 */             return new Object[] {
/*  50 */                 current[0],
/*  51 */                 current[7]};
/*  52 */           }
/*  53 */ 
/*  54 */         };
/*  55 */     }
/*  56 */ 
/*  57 */   };
/*  58 */   final org.apache.calcite.linq4j.Enumerable _inputEnumerable1 = 
left.join(right, new org.apache.calcite.linq4j.function.Function1() {
/*  59 */     public byte apply(byte v1) {
/*  60 */       return v1;
/*  61 */     }
/*  62 */     public Object apply(Byte v1) {
/*  63 */       return apply(
/*  64 */         v1.byteValue());
/*  65 */     }
/*  66 */     public Object apply(Object v1) {
/*  67 */       return apply(
/*  68 */         (Byte) v1);
/*  69 */     }
/*  70 */   }
/*  71 */   , new org.apache.calcite.linq4j.function.Function1() {
/*  72 */     public Byte apply(Object[] v1) {
/*  73 */       return (Byte) v1[1];
/*  74 */     }
/*  75 */     public Object apply(Object v1) {
/*  76 */       return apply(
/*  77 */         (Object[]) v1);
/*  78 */     }
/*  79 */   }
/*  80 */   , new org.apache.calcite.linq4j.function.Function2() {
/*  81 */     public Object[] apply(Byte left, Object[] right) {
/*  82 */       return new Object[] {
/*  83 */           left,
/*  84 */           right[0],
/*  85 */           right[1]};
/*  86 */     }
/*  87 */     public Object[] apply(Object left, Object right) {
/*  88 */       return apply(
/*  89 */         (Byte) left,
/*  90 */         (Object[]) right);
/*  91 */     }
/*  92 */   }
/*  93 */   , null, false, false);
/*  94 */   return new org.apache.calcite.linq4j.AbstractEnumerable(){
/*  95 */       public org.apache.calcite.linq4j.Enumerator enumerator() {
/*  96 */         return new org.apache.calcite.linq4j.Enumerator(){
/*  97 */             public final org.apache.calcite.linq4j.Enumerator 
inputEnumerator = _inputEnumerable1.enumerator();
/*  98 */             public void reset() {
/*  99 */               inputEnumerator.reset();
/* 100 */             }
/* 101 */ 
/* 102 */             public boolean moveNext() {
/* 103 */               return inputEnumerator.moveNext();
/* 104 */             }
/* 105 */ 
/* 106 */             public void close() {
/* 107 */               inputEnumerator.close();
/* 108 */             }
/* 109 */ 
/* 110 */             public Object current() {
/* 111 */               final Object[] current = (Object[]) 
inputEnumerator.current();
/* 112 */               return new Object[] {
/* 113 */                   current[2],
/* 114 */                   current[0]};
/* 115 */             }
/* 116 */ 
/* 117 */           };
/* 118 */       }
/* 119 */ 
/* 120 */     };
/* 121 */ }
/* 122 */ 
/* 123 */ 
/* 124 */ public Class getElementType() {
/* 125 */   return java.lang.Object[].class;
/* 126 */ }
/* 127 */ 
/* 128 */ {code}

> Ensure EnumerableJoin hashes the smallest input
> -----------------------------------------------
>
>                 Key: CALCITE-500
>                 URL: https://issues.apache.org/jira/browse/CALCITE-500
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.0.0-incubating
>            Reporter: Vladimir Sitnikov
>            Assignee: Atri Sharma
>              Labels: newbie
>
> {{EnumerableJoin}} tries to put the smallest input the first, however when it 
> comes to execution, Calcite creates lookup for _second_ input of join.
> It would be nice to ensure the lookup is created on the smallest input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CALCITE-500) Ensure EnumerableJoin hashes the smallest input

Reply via email to