[
https://issues.apache.org/jira/browse/CALCITE-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086209#comment-16086209
]
Vladimir Sitnikov commented on CALCITE-500:
-------------------------------------------
The below test was performed at
{noformat}
commit 9a5cd27415ea3a1a3955eaee2cb65aa2d69f62cf
Author: Junxian Wu <[email protected]>
Date: Thu Jul 13 10:57:20 2017 +0200
[CALCITE-1803] Push Project that follows Aggregate down to Druid (Junxian
Wu)
{noformat}
I've added the following test to JdbcTest:
{code:java}
@Test public void testJoin() {
CalciteAssert.that()
.with(CalciteAssert.Config.SCOTT)
.query("select e.deptno, d.DEPTNO from \"scott\".EMP e join
\"scott\".DEPT d on (e.deptno=d.DEPTNO)")
.explainMatches(" INCLUDING ALL ATTRIBUTES ",
checkResultContains("just print explain"))
;
}
{code}
It prints the following explain plan. Note that it estimates DEPT table to have
less rows.
{noformat}
PLAN=EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t2], DEPTNO0=[$t0]):
rowcount = 8.4, cumulative cost = {72.35357744447957 rows, 218.0 cpu, 0.0 io},
id = 97
EnumerableJoin(condition=[=($0, $2)], joinType=[inner]): rowcount = 8.4,
cumulative cost = {63.95357744447956 rows, 176.0 cpu, 0.0 io}, id = 93
EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0]): rowcount = 4.0,
cumulative cost = {8.0 rows, 21.0 cpu, 0.0 io}, id = 99
EnumerableTableScan(table=[[scott, DEPT]]): rowcount = 4.0, cumulative
cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 1
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7]): rowcount =
14.0, cumulative cost = {28.0 rows, 155.0 cpu, 0.0 io}, id = 101
EnumerableTableScan(table=[[scott, EMP]]): rowcount = 14.0, cumulative
cost = {14.0 rows, 15.0 cpu, 0.0 io}, id = 0
{noformat}
So my understanding is DEPT should be a lookup table, and EMP should be scanned
afterwards.
Let's check that.
I've replaced {{explainMatches(...)}} with {{.returns("abcd")}}, and added a
breakpoint to org.apache.calcite.linq4j.EnumerableDefaults#join_ over
{code:java} final Lookup<TKey, TInner> innerLookup =
comparer == null
? inner.toLookup(innerKeySelector)
: inner.toLookup(innerKeySelector, comparer);{code}
And it turns out "inner" is a EMP table.
Just in case, the generated code was:
{code:java}/* 1 */ org.apache.calcite.DataContext root;
/* 2 */
/* 3 */ public org.apache.calcite.linq4j.Enumerable bind(final
org.apache.calcite.DataContext root0) {
/* 4 */ root = root0;
/* 5 */ final org.apache.calcite.linq4j.Enumerable _inputEnumerable =
org.apache.calcite.schema.Schemas.queryable(root,
root.getRootSchema().getSubSchema("scott"), java.lang.Object[].class,
"DEPT").asEnumerable();
/* 6 */ final org.apache.calcite.linq4j.AbstractEnumerable left = new
org.apache.calcite.linq4j.AbstractEnumerable(){
/* 7 */ public org.apache.calcite.linq4j.Enumerator enumerator() {
/* 8 */ return new org.apache.calcite.linq4j.Enumerator(){
/* 9 */ public final org.apache.calcite.linq4j.Enumerator
inputEnumerator = _inputEnumerable.enumerator();
/* 10 */ public void reset() {
/* 11 */ inputEnumerator.reset();
/* 12 */ }
/* 13 */
/* 14 */ public boolean moveNext() {
/* 15 */ return inputEnumerator.moveNext();
/* 16 */ }
/* 17 */
/* 18 */ public void close() {
/* 19 */ inputEnumerator.close();
/* 20 */ }
/* 21 */
/* 22 */ public Object current() {
/* 23 */ return
org.apache.calcite.runtime.SqlFunctions.toByte(((Object[])
inputEnumerator.current())[0]);
/* 24 */ }
/* 25 */
/* 26 */ };
/* 27 */ }
/* 28 */
/* 29 */ };
/* 30 */ final org.apache.calcite.linq4j.Enumerable _inputEnumerable0 =
org.apache.calcite.schema.Schemas.queryable(root,
root.getRootSchema().getSubSchema("scott"), java.lang.Object[].class,
"EMP").asEnumerable();
/* 31 */ final org.apache.calcite.linq4j.AbstractEnumerable right = new
org.apache.calcite.linq4j.AbstractEnumerable(){
/* 32 */ public org.apache.calcite.linq4j.Enumerator enumerator() {
/* 33 */ return new org.apache.calcite.linq4j.Enumerator(){
/* 34 */ public final org.apache.calcite.linq4j.Enumerator
inputEnumerator = _inputEnumerable0.enumerator();
/* 35 */ public void reset() {
/* 36 */ inputEnumerator.reset();
/* 37 */ }
/* 38 */
/* 39 */ public boolean moveNext() {
/* 40 */ return inputEnumerator.moveNext();
/* 41 */ }
/* 42 */
/* 43 */ public void close() {
/* 44 */ inputEnumerator.close();
/* 45 */ }
/* 46 */
/* 47 */ public Object current() {
/* 48 */ final Object[] current = (Object[])
inputEnumerator.current();
/* 49 */ return new Object[] {
/* 50 */ current[0],
/* 51 */ current[7]};
/* 52 */ }
/* 53 */
/* 54 */ };
/* 55 */ }
/* 56 */
/* 57 */ };
/* 58 */ final org.apache.calcite.linq4j.Enumerable _inputEnumerable1 =
left.join(right, new org.apache.calcite.linq4j.function.Function1() {
/* 59 */ public byte apply(byte v1) {
/* 60 */ return v1;
/* 61 */ }
/* 62 */ public Object apply(Byte v1) {
/* 63 */ return apply(
/* 64 */ v1.byteValue());
/* 65 */ }
/* 66 */ public Object apply(Object v1) {
/* 67 */ return apply(
/* 68 */ (Byte) v1);
/* 69 */ }
/* 70 */ }
/* 71 */ , new org.apache.calcite.linq4j.function.Function1() {
/* 72 */ public Byte apply(Object[] v1) {
/* 73 */ return (Byte) v1[1];
/* 74 */ }
/* 75 */ public Object apply(Object v1) {
/* 76 */ return apply(
/* 77 */ (Object[]) v1);
/* 78 */ }
/* 79 */ }
/* 80 */ , new org.apache.calcite.linq4j.function.Function2() {
/* 81 */ public Object[] apply(Byte left, Object[] right) {
/* 82 */ return new Object[] {
/* 83 */ left,
/* 84 */ right[0],
/* 85 */ right[1]};
/* 86 */ }
/* 87 */ public Object[] apply(Object left, Object right) {
/* 88 */ return apply(
/* 89 */ (Byte) left,
/* 90 */ (Object[]) right);
/* 91 */ }
/* 92 */ }
/* 93 */ , null, false, false);
/* 94 */ return new org.apache.calcite.linq4j.AbstractEnumerable(){
/* 95 */ public org.apache.calcite.linq4j.Enumerator enumerator() {
/* 96 */ return new org.apache.calcite.linq4j.Enumerator(){
/* 97 */ public final org.apache.calcite.linq4j.Enumerator
inputEnumerator = _inputEnumerable1.enumerator();
/* 98 */ public void reset() {
/* 99 */ inputEnumerator.reset();
/* 100 */ }
/* 101 */
/* 102 */ public boolean moveNext() {
/* 103 */ return inputEnumerator.moveNext();
/* 104 */ }
/* 105 */
/* 106 */ public void close() {
/* 107 */ inputEnumerator.close();
/* 108 */ }
/* 109 */
/* 110 */ public Object current() {
/* 111 */ final Object[] current = (Object[])
inputEnumerator.current();
/* 112 */ return new Object[] {
/* 113 */ current[2],
/* 114 */ current[0]};
/* 115 */ }
/* 116 */
/* 117 */ };
/* 118 */ }
/* 119 */
/* 120 */ };
/* 121 */ }
/* 122 */
/* 123 */
/* 124 */ public Class getElementType() {
/* 125 */ return java.lang.Object[].class;
/* 126 */ }
/* 127 */
/* 128 */ {code}
> Ensure EnumerableJoin hashes the smallest input
> -----------------------------------------------
>
> Key: CALCITE-500
> URL: https://issues.apache.org/jira/browse/CALCITE-500
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.0.0-incubating
> Reporter: Vladimir Sitnikov
> Assignee: Atri Sharma
> Labels: newbie
>
> {{EnumerableJoin}} tries to put the smallest input the first, however when it
> comes to execution, Calcite creates lookup for _second_ input of join.
> It would be nice to ensure the lookup is created on the smallest input.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)