I agree with everything you said. The best plan is to join inside the JDBC 
source, and we’d hope that Calcite could find that plan.

It is curious that the JdbcFilter has infinite cost. I can’t tell by reading 
the trace output exactly why. I wonder whether it has failed to make a 
JdbcTableScan of the correct table within the correct JDBC database (which, as 
you note, is a particular instance of JdbcConvention).

There’s clearly a Calcite bug here, likely in the JDBC adapter. Can you log it?

You may be able to convert this into a test case on the standard data sources, 
e.g. by combining the scott and foodmart hsqldb databases.

Julian

> On May 26, 2016, at 1:46 AM, Jordan Halterman <[email protected]> 
> wrote:
> 
> I've been doing some experimenting with the Calcite Planner and am curious
> about a specific plan that's being generated. I've set up two JDBC schemas,
> one with "users" and "items" tables, and one with "orders" and "purchases".
> The Planner is set up using the same rules that are used in
> CalcitePrepareImpl with EnumerableConvention.
> 
> When I optimize a simple query that joins two tables within the same
> database, I get the expected plan, e.g.:
> 
> SELECT u.id AS user_id, i.id AS item_id, i.name, i.description FROM users u
> JOIN items i ON u.id = i.user_id
> 
> JdbcToEnumerableConverter: rowcount = 1500.0, cumulative cost = {3050.0
> rows, 5152.0 cpu, 0.0 io}, id = 43
>  JdbcProject(user_id=[$0], item_id=[$5], name=[$7], description=[$8]):
> rowcount = 1500.0, cumulative cost = {2900.0 rows, 5002.0 cpu, 0.0 io}, id
> = 42
>    JdbcJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = 1500.0,
> cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 41
>      JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
> {100.0 rows, 101.0 cpu, 0.0 io}, id = 5
>      JdbcTableScan(table=[[ITEMS]]): rowcount = 100.0, cumulative cost =
> {100.0 rows, 101.0 cpu, 0.0 io}, id = 6
> 
> However, when I optimize a query that joins two tables in one database with
> a table in another database, I don't get the plan I expect. None of the
> joins or filters are done in JDBC, only table scans e.g.:
> 
> SELECT u.id, o.id AS order_id FROM users u JOIN orders o ON u.id =
> o.user_id JOIN purchases p ON o.id = p.order_id WHERE p.order_id < 50
> 
> EnumerableProject(id=[$6], order_id=[$0]): rowcount = 11250.0, cumulative
> cost = {29206.571923496576 rows, 22933.0 cpu, 0.0 io}, id = 141
>  EnumerableJoin(condition=[=($1, $6)], joinType=[inner]): rowcount =
> 11250.0, cumulative cost = {17956.571923496576 rows, 433.0 cpu, 0.0 io}, id
> = 140
>    EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> 750.0, cumulative cost = {1531.517018598809 rows, 322.0 cpu, 0.0 io}, id =
> 138
>      JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0
> rows, 111.0 cpu, 0.0 io}, id = 135
>        JdbcTableScan(table=[[ORDERS]]): rowcount = 100.0, cumulative cost
> = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
>      EnumerableFilter(condition=[<($1, 50)]): rowcount = 50.0, cumulative
> cost = {160.0 rows, 211.0 cpu, 0.0 io}, id = 137
>        JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
> {110.0 rows, 111.0 cpu, 0.0 io}, id = 136
>          JdbcTableScan(table=[[PURCHASES]]): rowcount = 100.0, cumulative
> cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 12
>    JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0
> rows, 111.0 cpu, 0.0 io}, id = 139
>      JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
> {100.0 rows, 101.0 cpu, 0.0 io}, id = 9
> 
> It seems to me that the EnumerableFilter and EnumerableJoin of ORDERS and
> PURCHASES (which each have the same JdbcConvention) could be done in JDBC,
> i.e. something more like this:
> 
> EnumerableProject
>  EnumerableJoin
>    JdbcToEnumerableConverter
>      JdbcJoin
>        JdbcTableScan(table=[[ORDERS]]: ...)
>        JdbcFilter
>          JdbcTableScan(table=[[PURCHASES]]: ...)
>    JdbcToEnumerableConverter
>      JdbcTableScan(table=[[USERS]]: ...)
> 
> So, my question is, why is the join and even the filter not pushed down to
> JDBC? When I log the trace, I see that the JdbcJoin and JdbcFilter have an
> {inf} cost, but I'm ignorant as to why that is. Here's the trace:
> 
> Set#0, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
> VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT)
> rel#38:Subset#0.JDBC.site.[], best=rel#9, importance=0.7290000000000001
> rel#9:JdbcTableScan.JDBC.site.[](table=[USERS]), rowcount=100.0, cumulative
> cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#97:Subset#0.ENUMERABLE.[0], best=null, importance=0.7290000000000001
> rel#100:Subset#0.ENUMERABLE.[], best=rel#120, importance=0.7290000000000001
> rel#120:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#38:Subset#0.JDBC.site.[]),
> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER ID, INTEGER USER_ID)
> rel#39:Subset#1.JDBC.cart.[], best=rel#10, importance=0.7290000000000001
> rel#10:JdbcTableScan.JDBC.cart.[](table=[ORDERS]), rowcount=100.0,
> cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#124:Subset#1.ENUMERABLE.[], best=rel#123, importance=0.6561
> rel#123:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#39:Subset#1.JDBC.cart.[]),
> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> rel#127:Subset#1.JDBC.site.[], best=null, importance=0.5904900000000001
> rel#131:Subset#1.ENUMERABLE.[0], best=null, importance=0.6561
> Set#2, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
> VARCHAR(1) CREATED_AT)
> rel#40:Subset#2.JDBC.cart.[], best=rel#12, importance=0.6561
> rel#12:JdbcTableScan.JDBC.cart.[](table=[PURCHASES]), rowcount=100.0,
> cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#91:Subset#2.JDBC.site.[], best=null, importance=0.531441
> rel#102:Subset#2.ENUMERABLE.[], best=rel#114, importance=0.5904900000000001
> rel#114:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#40:Subset#2.JDBC.cart.[]),
> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> Set#3, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
> VARCHAR(1) CREATED_AT)
> rel#42:Subset#3.NONE.[], best=null, importance=0.7290000000000001
> rel#41:LogicalFilter.NONE.[](input=rel#40:Subset#2.JDBC.cart.[],condition=<($1,
> 50)), rowcount=50.0, cumulative cost={inf}
> rel#93:Subset#3.JDBC.site.[], best=null, importance=0.5904900000000001
> rel#92:JdbcFilter.JDBC.site.[](input=rel#91:Subset#2.JDBC.site.[],condition=<($1,
> 50)), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#3.ENUMERABLE.[], best=rel#103, importance=0.6561
> rel#103:EnumerableFilter.ENUMERABLE.[](input=rel#102:Subset#2.ENUMERABLE.[],condition=<($1,
> 50)), rowcount=50.0, cumulative cost={160.0 rows, 211.0 cpu, 0.0 io}
> rel#115:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#93:Subset#3.JDBC.site.[]),
> rowcount=50.0, cumulative cost={inf}
> rel#132:Subset#3.ENUMERABLE.[1], best=null, importance=0.6561
> Set#4, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
> VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT, INTEGER ID0, INTEGER USER_ID,
> INTEGER ID1, INTEGER ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT0)
> rel#44:Subset#4.NONE.[], best=null, importance=0.81
> rel#43:MultiJoin.NONE.[](input#0=rel#38:Subset#0.JDBC.site.[],input#1=rel#39:Subset#1.JDBC.cart.[],input#2=rel#42:Subset#3.NONE.[],joinFilter=AND(=($5,
> $8), =($0, $6)),isFullOuterJoin=false,joinTypes=[INNER, INNER,
> INNER],outerJoinConditions=[NULL, NULL, NULL],projFields=[ALL, ALL, ALL]),
> rowcount=1.0, cumulative cost={inf}
> rel#76:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> rowcount=11250.0, cumulative cost={inf}
> rel#49:Subset#4.JDBC.site.[], best=null, importance=0.81
> rel#88:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> rowcount=11250.0, cumulative cost={inf}
> rel#56:Subset#4.ENUMERABLE.[], best=rel#94, importance=0.9
> rel#94:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> rowcount=11250.0, cumulative cost={29206.571923496576 rows, 124183.0 cpu,
> 0.0 io}
> rel#113:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#49:Subset#4.JDBC.site.[]),
> rowcount=1.0, cumulative cost={inf}
> Set#5, type: RecordType(INTEGER id, INTEGER order_id)
> rel#46:Subset#5.NONE.[], best=null, importance=0.9
> rel#45:LogicalProject.NONE.[](input=rel#44:Subset#4.NONE.[],id=$0,order_id=$5),
> rowcount=1.0, cumulative cost={inf}
> rel#81:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],id=$6,order_id=$0),
> rowcount=11250.0, cumulative cost={inf}
> rel#95:LogicalProject.NONE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
> rowcount=11250.0, cumulative cost={inf}
> rel#107:LogicalProject.NONE.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
> rowcount=11250.0, cumulative cost={inf}
> rel#47:Subset#5.ENUMERABLE.[], best=rel#87, importance=1.0
> rel#48:AbstractConverter.ENUMERABLE.[](input=rel#46:Subset#5.NONE.[],convention=ENUMERABLE,sort=[]),
> rowcount=1.0, cumulative cost={inf}
> rel#52:AbstractConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[],convention=ENUMERABLE,sort=[]),
> rowcount=1.0, cumulative cost={inf}
> rel#53:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[]),
> rowcount=1.0, cumulative cost={inf}
> rel#57:EnumerableProject.ENUMERABLE.[](input=rel#56:Subset#4.ENUMERABLE.[],id=$0,order_id=$5),
> rowcount=11250.0, cumulative cost={40456.57192349658 rows, 146683.0 cpu,
> 0.0 io}
> rel#87:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
> rowcount=11250.0, cumulative cost={29206.571923496576 rows, 22933.0 cpu,
> 0.0 io}
> rel#51:Subset#5.JDBC.site.[], best=null, importance=0.9
> rel#50:JdbcProject.JDBC.site.[](input=rel#49:Subset#4.JDBC.site.[],id=$0,order_id=$5),
> rowcount=1.0, cumulative cost={inf}
> rel#83:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
> rowcount=11250.0, cumulative cost={inf}
> Set#6, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER
> ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT)
> rel#73:Subset#6.NONE.[], best=null, importance=0.6561
> rel#60:LogicalJoin.NONE.[](left=rel#39:Subset#1.JDBC.cart.[],right=rel#42:Subset#3.NONE.[],condition==($0,
> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> rel#89:Subset#6.JDBC.site.[], best=null, importance=0.6561
> rel#128:JdbcJoin.JDBC.site.[](left=rel#127:Subset#1.JDBC.site.[],right=rel#93:Subset#3.JDBC.site.[],condition==($0,
> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> rel#96:Subset#6.ENUMERABLE.[1], best=null, importance=0.7290000000000001
> rel#99:Subset#6.ENUMERABLE.[], best=rel#134, importance=0.7290000000000001
> rel#130:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#89:Subset#6.JDBC.site.[]),
> rowcount=750.0, cumulative cost={inf}
> rel#133:EnumerableMergeJoin.ENUMERABLE.[[0],
> [3]](left=rel#131:Subset#1.ENUMERABLE.[0],right=rel#132:Subset#3.ENUMERABLE.[1],condition==($0,
> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> rel#134:EnumerableJoin.ENUMERABLE.[](left=rel#124:Subset#1.ENUMERABLE.[],right=rel#104:Subset#3.ENUMERABLE.[],condition==($0,
> $3),joinType=inner), rowcount=750.0, cumulative cost={1531.517018598809
> rows, 322.0 cpu, 0.0 io}
> Set#7, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER
> ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT, INTEGER ID1, VARCHAR(1)
> NAME, VARCHAR(1) EMAIL, VARCHAR(1) CREATED_AT0, VARCHAR(1) UPDATED_AT)
> rel#75:Subset#7.NONE.[], best=null, importance=0.7290000000000001
> rel#74:LogicalJoin.NONE.[](left=rel#73:Subset#6.NONE.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
> $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> rel#82:Subset#7.JDBC.site.[], best=null, importance=0.7290000000000001
> rel#90:JdbcJoin.JDBC.site.[](left=rel#89:Subset#6.JDBC.site.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
> $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> rel#86:Subset#7.ENUMERABLE.[], best=rel#101, importance=0.81
> rel#98:EnumerableMergeJoin.ENUMERABLE.[[1],
> [6]](left=rel#96:Subset#6.ENUMERABLE.[1],right=rel#97:Subset#0.ENUMERABLE.[0],condition==($1,
> $6),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> rel#101:EnumerableJoin.ENUMERABLE.[](left=rel#99:Subset#6.ENUMERABLE.[],right=rel#100:Subset#0.ENUMERABLE.[],condition==($1,
> $6),joinType=inner), rowcount=11250.0, cumulative cost={17956.571923496576
> rows, 433.0 cpu, 0.0 io}
> rel#109:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#82:Subset#7.JDBC.site.[]),
> rowcount=11250.0, cumulative cost={inf}

Reply via email to