Glad we're seeing the same thing. I can take a shot at reproducing it in a test.
On Thu, May 26, 2016 at 1:17 PM, Julian Hyde <[email protected]> wrote: > I agree with everything you said. The best plan is to join inside the JDBC > source, and we’d hope that Calcite could find that plan. > > It is curious that the JdbcFilter has infinite cost. I can’t tell by > reading the trace output exactly why. I wonder whether it has failed to > make a JdbcTableScan of the correct table within the correct JDBC database > (which, as you note, is a particular instance of JdbcConvention). > > There’s clearly a Calcite bug here, likely in the JDBC adapter. Can you > log it? > > You may be able to convert this into a test case on the standard data > sources, e.g. by combining the scott and foodmart hsqldb databases. > > Julian > > > On May 26, 2016, at 1:46 AM, Jordan Halterman < > [email protected]> wrote: > > > > I've been doing some experimenting with the Calcite Planner and am > curious > > about a specific plan that's being generated. I've set up two JDBC > schemas, > > one with "users" and "items" tables, and one with "orders" and > "purchases". > > The Planner is set up using the same rules that are used in > > CalcitePrepareImpl with EnumerableConvention. > > > > When I optimize a simple query that joins two tables within the same > > database, I get the expected plan, e.g.: > > > > SELECT u.id AS user_id, i.id AS item_id, i.name, i.description FROM > users u > > JOIN items i ON u.id = i.user_id > > > > JdbcToEnumerableConverter: rowcount = 1500.0, cumulative cost = {3050.0 > > rows, 5152.0 cpu, 0.0 io}, id = 43 > > JdbcProject(user_id=[$0], item_id=[$5], name=[$7], description=[$8]): > > rowcount = 1500.0, cumulative cost = {2900.0 rows, 5002.0 cpu, 0.0 io}, > id > > = 42 > > JdbcJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = 1500.0, > > cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 41 > > JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost = > > {100.0 rows, 101.0 cpu, 0.0 io}, id = 5 > > JdbcTableScan(table=[[ITEMS]]): rowcount = 100.0, cumulative cost = > > {100.0 rows, 101.0 cpu, 0.0 io}, id = 6 > > > > However, when I optimize a query that joins two tables in one database > with > > a table in another database, I don't get the plan I expect. None of the > > joins or filters are done in JDBC, only table scans e.g.: > > > > SELECT u.id, o.id AS order_id FROM users u JOIN orders o ON u.id = > > o.user_id JOIN purchases p ON o.id = p.order_id WHERE p.order_id < 50 > > > > EnumerableProject(id=[$6], order_id=[$0]): rowcount = 11250.0, cumulative > > cost = {29206.571923496576 rows, 22933.0 cpu, 0.0 io}, id = 141 > > EnumerableJoin(condition=[=($1, $6)], joinType=[inner]): rowcount = > > 11250.0, cumulative cost = {17956.571923496576 rows, 433.0 cpu, 0.0 io}, > id > > = 140 > > EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount = > > 750.0, cumulative cost = {1531.517018598809 rows, 322.0 cpu, 0.0 io}, id > = > > 138 > > JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = > {110.0 > > rows, 111.0 cpu, 0.0 io}, id = 135 > > JdbcTableScan(table=[[ORDERS]]): rowcount = 100.0, cumulative cost > > = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10 > > EnumerableFilter(condition=[<($1, 50)]): rowcount = 50.0, cumulative > > cost = {160.0 rows, 211.0 cpu, 0.0 io}, id = 137 > > JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = > > {110.0 rows, 111.0 cpu, 0.0 io}, id = 136 > > JdbcTableScan(table=[[PURCHASES]]): rowcount = 100.0, cumulative > > cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 12 > > JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 > > rows, 111.0 cpu, 0.0 io}, id = 139 > > JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost = > > {100.0 rows, 101.0 cpu, 0.0 io}, id = 9 > > > > It seems to me that the EnumerableFilter and EnumerableJoin of ORDERS and > > PURCHASES (which each have the same JdbcConvention) could be done in > JDBC, > > i.e. something more like this: > > > > EnumerableProject > > EnumerableJoin > > JdbcToEnumerableConverter > > JdbcJoin > > JdbcTableScan(table=[[ORDERS]]: ...) > > JdbcFilter > > JdbcTableScan(table=[[PURCHASES]]: ...) > > JdbcToEnumerableConverter > > JdbcTableScan(table=[[USERS]]: ...) > > > > So, my question is, why is the join and even the filter not pushed down > to > > JDBC? When I log the trace, I see that the JdbcJoin and JdbcFilter have > an > > {inf} cost, but I'm ignorant as to why that is. Here's the trace: > > > > Set#0, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL, > > VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT) > > rel#38:Subset#0.JDBC.site.[], best=rel#9, importance=0.7290000000000001 > > rel#9:JdbcTableScan.JDBC.site.[](table=[USERS]), rowcount=100.0, > cumulative > > cost={100.0 rows, 101.0 cpu, 0.0 io} > > rel#97:Subset#0.ENUMERABLE.[0], best=null, importance=0.7290000000000001 > > rel#100:Subset#0.ENUMERABLE.[], best=rel#120, > importance=0.7290000000000001 > > > rel#120:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#38:Subset#0.JDBC.site.[]), > > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io} > > Set#1, type: RecordType(INTEGER ID, INTEGER USER_ID) > > rel#39:Subset#1.JDBC.cart.[], best=rel#10, importance=0.7290000000000001 > > rel#10:JdbcTableScan.JDBC.cart.[](table=[ORDERS]), rowcount=100.0, > > cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > > rel#124:Subset#1.ENUMERABLE.[], best=rel#123, importance=0.6561 > > > rel#123:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#39:Subset#1.JDBC.cart.[]), > > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io} > > rel#127:Subset#1.JDBC.site.[], best=null, importance=0.5904900000000001 > > rel#131:Subset#1.ENUMERABLE.[0], best=null, importance=0.6561 > > Set#2, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID, > > VARCHAR(1) CREATED_AT) > > rel#40:Subset#2.JDBC.cart.[], best=rel#12, importance=0.6561 > > rel#12:JdbcTableScan.JDBC.cart.[](table=[PURCHASES]), rowcount=100.0, > > cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > > rel#91:Subset#2.JDBC.site.[], best=null, importance=0.531441 > > rel#102:Subset#2.ENUMERABLE.[], best=rel#114, > importance=0.5904900000000001 > > > rel#114:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#40:Subset#2.JDBC.cart.[]), > > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io} > > Set#3, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID, > > VARCHAR(1) CREATED_AT) > > rel#42:Subset#3.NONE.[], best=null, importance=0.7290000000000001 > > > rel#41:LogicalFilter.NONE.[](input=rel#40:Subset#2.JDBC.cart.[],condition=<($1, > > 50)), rowcount=50.0, cumulative cost={inf} > > rel#93:Subset#3.JDBC.site.[], best=null, importance=0.5904900000000001 > > > rel#92:JdbcFilter.JDBC.site.[](input=rel#91:Subset#2.JDBC.site.[],condition=<($1, > > 50)), rowcount=50.0, cumulative cost={inf} > > rel#104:Subset#3.ENUMERABLE.[], best=rel#103, importance=0.6561 > > > rel#103:EnumerableFilter.ENUMERABLE.[](input=rel#102:Subset#2.ENUMERABLE.[],condition=<($1, > > 50)), rowcount=50.0, cumulative cost={160.0 rows, 211.0 cpu, 0.0 io} > > > rel#115:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#93:Subset#3.JDBC.site.[]), > > rowcount=50.0, cumulative cost={inf} > > rel#132:Subset#3.ENUMERABLE.[1], best=null, importance=0.6561 > > Set#4, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL, > > VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT, INTEGER ID0, INTEGER > USER_ID, > > INTEGER ID1, INTEGER ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT0) > > rel#44:Subset#4.NONE.[], best=null, importance=0.81 > > > rel#43:MultiJoin.NONE.[](input#0=rel#38:Subset#0.JDBC.site.[],input#1=rel#39:Subset#1.JDBC.cart.[],input#2=rel#42:Subset#3.NONE.[],joinFilter=AND(=($5, > > $8), =($0, $6)),isFullOuterJoin=false,joinTypes=[INNER, INNER, > > INNER],outerJoinConditions=[NULL, NULL, NULL],projFields=[ALL, ALL, > ALL]), > > rowcount=1.0, cumulative cost={inf} > > > rel#76:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5), > > rowcount=11250.0, cumulative cost={inf} > > rel#49:Subset#4.JDBC.site.[], best=null, importance=0.81 > > > rel#88:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5), > > rowcount=11250.0, cumulative cost={inf} > > rel#56:Subset#4.ENUMERABLE.[], best=rel#94, importance=0.9 > > > rel#94:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5), > > rowcount=11250.0, cumulative cost={29206.571923496576 rows, 124183.0 cpu, > > 0.0 io} > > > rel#113:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#49:Subset#4.JDBC.site.[]), > > rowcount=1.0, cumulative cost={inf} > > Set#5, type: RecordType(INTEGER id, INTEGER order_id) > > rel#46:Subset#5.NONE.[], best=null, importance=0.9 > > > rel#45:LogicalProject.NONE.[](input=rel#44:Subset#4.NONE.[],id=$0,order_id=$5), > > rowcount=1.0, cumulative cost={inf} > > > rel#81:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],id=$6,order_id=$0), > > rowcount=11250.0, cumulative cost={inf} > > > rel#95:LogicalProject.NONE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0), > > rowcount=11250.0, cumulative cost={inf} > > > rel#107:LogicalProject.NONE.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0), > > rowcount=11250.0, cumulative cost={inf} > > rel#47:Subset#5.ENUMERABLE.[], best=rel#87, importance=1.0 > > > rel#48:AbstractConverter.ENUMERABLE.[](input=rel#46:Subset#5.NONE.[],convention=ENUMERABLE,sort=[]), > > rowcount=1.0, cumulative cost={inf} > > > rel#52:AbstractConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[],convention=ENUMERABLE,sort=[]), > > rowcount=1.0, cumulative cost={inf} > > > rel#53:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[]), > > rowcount=1.0, cumulative cost={inf} > > > rel#57:EnumerableProject.ENUMERABLE.[](input=rel#56:Subset#4.ENUMERABLE.[],id=$0,order_id=$5), > > rowcount=11250.0, cumulative cost={40456.57192349658 rows, 146683.0 cpu, > > 0.0 io} > > > rel#87:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0), > > rowcount=11250.0, cumulative cost={29206.571923496576 rows, 22933.0 cpu, > > 0.0 io} > > rel#51:Subset#5.JDBC.site.[], best=null, importance=0.9 > > > rel#50:JdbcProject.JDBC.site.[](input=rel#49:Subset#4.JDBC.site.[],id=$0,order_id=$5), > > rowcount=1.0, cumulative cost={inf} > > > rel#83:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0), > > rowcount=11250.0, cumulative cost={inf} > > Set#6, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER > > ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT) > > rel#73:Subset#6.NONE.[], best=null, importance=0.6561 > > > rel#60:LogicalJoin.NONE.[](left=rel#39:Subset#1.JDBC.cart.[],right=rel#42:Subset#3.NONE.[],condition==($0, > > $3),joinType=inner), rowcount=750.0, cumulative cost={inf} > > rel#89:Subset#6.JDBC.site.[], best=null, importance=0.6561 > > > rel#128:JdbcJoin.JDBC.site.[](left=rel#127:Subset#1.JDBC.site.[],right=rel#93:Subset#3.JDBC.site.[],condition==($0, > > $3),joinType=inner), rowcount=750.0, cumulative cost={inf} > > rel#96:Subset#6.ENUMERABLE.[1], best=null, importance=0.7290000000000001 > > rel#99:Subset#6.ENUMERABLE.[], best=rel#134, > importance=0.7290000000000001 > > > rel#130:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#89:Subset#6.JDBC.site.[]), > > rowcount=750.0, cumulative cost={inf} > > rel#133:EnumerableMergeJoin.ENUMERABLE.[[0], > > > [3]](left=rel#131:Subset#1.ENUMERABLE.[0],right=rel#132:Subset#3.ENUMERABLE.[1],condition==($0, > > $3),joinType=inner), rowcount=750.0, cumulative cost={inf} > > > rel#134:EnumerableJoin.ENUMERABLE.[](left=rel#124:Subset#1.ENUMERABLE.[],right=rel#104:Subset#3.ENUMERABLE.[],condition==($0, > > $3),joinType=inner), rowcount=750.0, cumulative cost={1531.517018598809 > > rows, 322.0 cpu, 0.0 io} > > Set#7, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER > > ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT, INTEGER ID1, VARCHAR(1) > > NAME, VARCHAR(1) EMAIL, VARCHAR(1) CREATED_AT0, VARCHAR(1) UPDATED_AT) > > rel#75:Subset#7.NONE.[], best=null, importance=0.7290000000000001 > > > rel#74:LogicalJoin.NONE.[](left=rel#73:Subset#6.NONE.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6, > > $1),joinType=inner), rowcount=11250.0, cumulative cost={inf} > > rel#82:Subset#7.JDBC.site.[], best=null, importance=0.7290000000000001 > > > rel#90:JdbcJoin.JDBC.site.[](left=rel#89:Subset#6.JDBC.site.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6, > > $1),joinType=inner), rowcount=11250.0, cumulative cost={inf} > > rel#86:Subset#7.ENUMERABLE.[], best=rel#101, importance=0.81 > > rel#98:EnumerableMergeJoin.ENUMERABLE.[[1], > > > [6]](left=rel#96:Subset#6.ENUMERABLE.[1],right=rel#97:Subset#0.ENUMERABLE.[0],condition==($1, > > $6),joinType=inner), rowcount=11250.0, cumulative cost={inf} > > > rel#101:EnumerableJoin.ENUMERABLE.[](left=rel#99:Subset#6.ENUMERABLE.[],right=rel#100:Subset#0.ENUMERABLE.[],condition==($1, > > $6),joinType=inner), rowcount=11250.0, cumulative > cost={17956.571923496576 > > rows, 433.0 cpu, 0.0 io} > > > rel#109:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#82:Subset#7.JDBC.site.[]), > > rowcount=11250.0, cumulative cost={inf} > >
