Re: Join not being pushed to JDBC database when also joining across databases

Jordan Halterman Sat, 28 May 2016 19:15:07 -0700

I actually got a chance to do some more digging in to this over the
weekend. When I step through the VolcanoPlanner in my debugger, I realize
that while a JdbcFilter rule is being created for each unique JdbcConvention,
only one of those two rules is actually being fired. This seems to be a
result of the JDBC rule names. JdbcToEnumerableConverterRule is the only
JDBC rule using the convention in its description. All other JDBC rules
just have static descriptions, e.g. JdbcFilterRule or JdbcProjectRule. This
results in the rules being fired only for *one* of the JdbcConventions. I
verified that the expected plan is output when I change all the JDBC rules
to use a unique description-per-convention.


Does this make sense?

I will open a Jira ticket for this

On Thu, May 26, 2016 at 1:21 PM, Jordan Halterman <
[email protected]> wrote:

> Glad we're seeing the same thing. I can take a shot at reproducing it in a
> test.
>
> On Thu, May 26, 2016 at 1:17 PM, Julian Hyde <[email protected]> wrote:
>
>> I agree with everything you said. The best plan is to join inside the
>> JDBC source, and we’d hope that Calcite could find that plan.
>>
>> It is curious that the JdbcFilter has infinite cost. I can’t tell by
>> reading the trace output exactly why. I wonder whether it has failed to
>> make a JdbcTableScan of the correct table within the correct JDBC database
>> (which, as you note, is a particular instance of JdbcConvention).
>>
>> There’s clearly a Calcite bug here, likely in the JDBC adapter. Can you
>> log it?
>>
>> You may be able to convert this into a test case on the standard data
>> sources, e.g. by combining the scott and foodmart hsqldb databases.
>>
>> Julian
>>
>> > On May 26, 2016, at 1:46 AM, Jordan Halterman <
>> [email protected]> wrote:
>> >
>> > I've been doing some experimenting with the Calcite Planner and am
>> curious
>> > about a specific plan that's being generated. I've set up two JDBC
>> schemas,
>> > one with "users" and "items" tables, and one with "orders" and
>> "purchases".
>> > The Planner is set up using the same rules that are used in
>> > CalcitePrepareImpl with EnumerableConvention.
>> >
>> > When I optimize a simple query that joins two tables within the same
>> > database, I get the expected plan, e.g.:
>> >
>> > SELECT u.id AS user_id, i.id AS item_id, i.name, i.description FROM
>> users u
>> > JOIN items i ON u.id = i.user_id
>> >
>> > JdbcToEnumerableConverter: rowcount = 1500.0, cumulative cost = {3050.0
>> > rows, 5152.0 cpu, 0.0 io}, id = 43
>> >  JdbcProject(user_id=[$0], item_id=[$5], name=[$7], description=[$8]):
>> > rowcount = 1500.0, cumulative cost = {2900.0 rows, 5002.0 cpu, 0.0 io},
>> id
>> > = 42
>> >    JdbcJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = 1500.0,
>> > cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 41
>> >      JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
>> > {100.0 rows, 101.0 cpu, 0.0 io}, id = 5
>> >      JdbcTableScan(table=[[ITEMS]]): rowcount = 100.0, cumulative cost =
>> > {100.0 rows, 101.0 cpu, 0.0 io}, id = 6
>> >
>> > However, when I optimize a query that joins two tables in one database
>> with
>> > a table in another database, I don't get the plan I expect. None of the
>> > joins or filters are done in JDBC, only table scans e.g.:
>> >
>> > SELECT u.id, o.id AS order_id FROM users u JOIN orders o ON u.id =
>> > o.user_id JOIN purchases p ON o.id = p.order_id WHERE p.order_id < 50
>> >
>> > EnumerableProject(id=[$6], order_id=[$0]): rowcount = 11250.0,
>> cumulative
>> > cost = {29206.571923496576 rows, 22933.0 cpu, 0.0 io}, id = 141
>> >  EnumerableJoin(condition=[=($1, $6)], joinType=[inner]): rowcount =
>> > 11250.0, cumulative cost = {17956.571923496576 rows, 433.0 cpu, 0.0
>> io}, id
>> > = 140
>> >    EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
>> > 750.0, cumulative cost = {1531.517018598809 rows, 322.0 cpu, 0.0 io},
>> id =
>> > 138
>> >      JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
>> {110.0
>> > rows, 111.0 cpu, 0.0 io}, id = 135
>> >        JdbcTableScan(table=[[ORDERS]]): rowcount = 100.0, cumulative
>> cost
>> > = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
>> >      EnumerableFilter(condition=[<($1, 50)]): rowcount = 50.0,
>> cumulative
>> > cost = {160.0 rows, 211.0 cpu, 0.0 io}, id = 137
>> >        JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
>> > {110.0 rows, 111.0 cpu, 0.0 io}, id = 136
>> >          JdbcTableScan(table=[[PURCHASES]]): rowcount = 100.0,
>> cumulative
>> > cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 12
>> >    JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0
>> > rows, 111.0 cpu, 0.0 io}, id = 139
>> >      JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
>> > {100.0 rows, 101.0 cpu, 0.0 io}, id = 9
>> >
>> > It seems to me that the EnumerableFilter and EnumerableJoin of ORDERS
>> and
>> > PURCHASES (which each have the same JdbcConvention) could be done in
>> JDBC,
>> > i.e. something more like this:
>> >
>> > EnumerableProject
>> >  EnumerableJoin
>> >    JdbcToEnumerableConverter
>> >      JdbcJoin
>> >        JdbcTableScan(table=[[ORDERS]]: ...)
>> >        JdbcFilter
>> >          JdbcTableScan(table=[[PURCHASES]]: ...)
>> >    JdbcToEnumerableConverter
>> >      JdbcTableScan(table=[[USERS]]: ...)
>> >
>> > So, my question is, why is the join and even the filter not pushed down
>> to
>> > JDBC? When I log the trace, I see that the JdbcJoin and JdbcFilter have
>> an
>> > {inf} cost, but I'm ignorant as to why that is. Here's the trace:
>> >
>> > Set#0, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
>> > VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT)
>> > rel#38:Subset#0.JDBC.site.[], best=rel#9, importance=0.7290000000000001
>> > rel#9:JdbcTableScan.JDBC.site.[](table=[USERS]), rowcount=100.0,
>> cumulative
>> > cost={100.0 rows, 101.0 cpu, 0.0 io}
>> > rel#97:Subset#0.ENUMERABLE.[0], best=null, importance=0.7290000000000001
>> > rel#100:Subset#0.ENUMERABLE.[], best=rel#120,
>> importance=0.7290000000000001
>> >
>> rel#120:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#38:Subset#0.JDBC.site.[]),
>> > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
>> > Set#1, type: RecordType(INTEGER ID, INTEGER USER_ID)
>> > rel#39:Subset#1.JDBC.cart.[], best=rel#10, importance=0.7290000000000001
>> > rel#10:JdbcTableScan.JDBC.cart.[](table=[ORDERS]), rowcount=100.0,
>> > cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>> > rel#124:Subset#1.ENUMERABLE.[], best=rel#123, importance=0.6561
>> >
>> rel#123:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#39:Subset#1.JDBC.cart.[]),
>> > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
>> > rel#127:Subset#1.JDBC.site.[], best=null, importance=0.5904900000000001
>> > rel#131:Subset#1.ENUMERABLE.[0], best=null, importance=0.6561
>> > Set#2, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
>> > VARCHAR(1) CREATED_AT)
>> > rel#40:Subset#2.JDBC.cart.[], best=rel#12, importance=0.6561
>> > rel#12:JdbcTableScan.JDBC.cart.[](table=[PURCHASES]), rowcount=100.0,
>> > cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>> > rel#91:Subset#2.JDBC.site.[], best=null, importance=0.531441
>> > rel#102:Subset#2.ENUMERABLE.[], best=rel#114,
>> importance=0.5904900000000001
>> >
>> rel#114:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#40:Subset#2.JDBC.cart.[]),
>> > rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
>> > Set#3, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
>> > VARCHAR(1) CREATED_AT)
>> > rel#42:Subset#3.NONE.[], best=null, importance=0.7290000000000001
>> >
>> rel#41:LogicalFilter.NONE.[](input=rel#40:Subset#2.JDBC.cart.[],condition=<($1,
>> > 50)), rowcount=50.0, cumulative cost={inf}
>> > rel#93:Subset#3.JDBC.site.[], best=null, importance=0.5904900000000001
>> >
>> rel#92:JdbcFilter.JDBC.site.[](input=rel#91:Subset#2.JDBC.site.[],condition=<($1,
>> > 50)), rowcount=50.0, cumulative cost={inf}
>> > rel#104:Subset#3.ENUMERABLE.[], best=rel#103, importance=0.6561
>> >
>> rel#103:EnumerableFilter.ENUMERABLE.[](input=rel#102:Subset#2.ENUMERABLE.[],condition=<($1,
>> > 50)), rowcount=50.0, cumulative cost={160.0 rows, 211.0 cpu, 0.0 io}
>> >
>> rel#115:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#93:Subset#3.JDBC.site.[]),
>> > rowcount=50.0, cumulative cost={inf}
>> > rel#132:Subset#3.ENUMERABLE.[1], best=null, importance=0.6561
>> > Set#4, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
>> > VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT, INTEGER ID0, INTEGER
>> USER_ID,
>> > INTEGER ID1, INTEGER ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT0)
>> > rel#44:Subset#4.NONE.[], best=null, importance=0.81
>> >
>> rel#43:MultiJoin.NONE.[](input#0=rel#38:Subset#0.JDBC.site.[],input#1=rel#39:Subset#1.JDBC.cart.[],input#2=rel#42:Subset#3.NONE.[],joinFilter=AND(=($5,
>> > $8), =($0, $6)),isFullOuterJoin=false,joinTypes=[INNER, INNER,
>> > INNER],outerJoinConditions=[NULL, NULL, NULL],projFields=[ALL, ALL,
>> ALL]),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#76:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
>> > rowcount=11250.0, cumulative cost={inf}
>> > rel#49:Subset#4.JDBC.site.[], best=null, importance=0.81
>> >
>> rel#88:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
>> > rowcount=11250.0, cumulative cost={inf}
>> > rel#56:Subset#4.ENUMERABLE.[], best=rel#94, importance=0.9
>> >
>> rel#94:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
>> > rowcount=11250.0, cumulative cost={29206.571923496576 rows, 124183.0
>> cpu,
>> > 0.0 io}
>> >
>> rel#113:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#49:Subset#4.JDBC.site.[]),
>> > rowcount=1.0, cumulative cost={inf}
>> > Set#5, type: RecordType(INTEGER id, INTEGER order_id)
>> > rel#46:Subset#5.NONE.[], best=null, importance=0.9
>> >
>> rel#45:LogicalProject.NONE.[](input=rel#44:Subset#4.NONE.[],id=$0,order_id=$5),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#81:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],id=$6,order_id=$0),
>> > rowcount=11250.0, cumulative cost={inf}
>> >
>> rel#95:LogicalProject.NONE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
>> > rowcount=11250.0, cumulative cost={inf}
>> >
>> rel#107:LogicalProject.NONE.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
>> > rowcount=11250.0, cumulative cost={inf}
>> > rel#47:Subset#5.ENUMERABLE.[], best=rel#87, importance=1.0
>> >
>> rel#48:AbstractConverter.ENUMERABLE.[](input=rel#46:Subset#5.NONE.[],convention=ENUMERABLE,sort=[]),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#52:AbstractConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[],convention=ENUMERABLE,sort=[]),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#53:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[]),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#57:EnumerableProject.ENUMERABLE.[](input=rel#56:Subset#4.ENUMERABLE.[],id=$0,order_id=$5),
>> > rowcount=11250.0, cumulative cost={40456.57192349658 rows, 146683.0 cpu,
>> > 0.0 io}
>> >
>> rel#87:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
>> > rowcount=11250.0, cumulative cost={29206.571923496576 rows, 22933.0 cpu,
>> > 0.0 io}
>> > rel#51:Subset#5.JDBC.site.[], best=null, importance=0.9
>> >
>> rel#50:JdbcProject.JDBC.site.[](input=rel#49:Subset#4.JDBC.site.[],id=$0,order_id=$5),
>> > rowcount=1.0, cumulative cost={inf}
>> >
>> rel#83:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
>> > rowcount=11250.0, cumulative cost={inf}
>> > Set#6, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0,
>> INTEGER
>> > ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT)
>> > rel#73:Subset#6.NONE.[], best=null, importance=0.6561
>> >
>> rel#60:LogicalJoin.NONE.[](left=rel#39:Subset#1.JDBC.cart.[],right=rel#42:Subset#3.NONE.[],condition==($0,
>> > $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
>> > rel#89:Subset#6.JDBC.site.[], best=null, importance=0.6561
>> >
>> rel#128:JdbcJoin.JDBC.site.[](left=rel#127:Subset#1.JDBC.site.[],right=rel#93:Subset#3.JDBC.site.[],condition==($0,
>> > $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
>> > rel#96:Subset#6.ENUMERABLE.[1], best=null, importance=0.7290000000000001
>> > rel#99:Subset#6.ENUMERABLE.[], best=rel#134,
>> importance=0.7290000000000001
>> >
>> rel#130:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#89:Subset#6.JDBC.site.[]),
>> > rowcount=750.0, cumulative cost={inf}
>> > rel#133:EnumerableMergeJoin.ENUMERABLE.[[0],
>> >
>> [3]](left=rel#131:Subset#1.ENUMERABLE.[0],right=rel#132:Subset#3.ENUMERABLE.[1],condition==($0,
>> > $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
>> >
>> rel#134:EnumerableJoin.ENUMERABLE.[](left=rel#124:Subset#1.ENUMERABLE.[],right=rel#104:Subset#3.ENUMERABLE.[],condition==($0,
>> > $3),joinType=inner), rowcount=750.0, cumulative cost={1531.517018598809
>> > rows, 322.0 cpu, 0.0 io}
>> > Set#7, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0,
>> INTEGER
>> > ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT, INTEGER ID1,
>> VARCHAR(1)
>> > NAME, VARCHAR(1) EMAIL, VARCHAR(1) CREATED_AT0, VARCHAR(1) UPDATED_AT)
>> > rel#75:Subset#7.NONE.[], best=null, importance=0.7290000000000001
>> >
>> rel#74:LogicalJoin.NONE.[](left=rel#73:Subset#6.NONE.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
>> > $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
>> > rel#82:Subset#7.JDBC.site.[], best=null, importance=0.7290000000000001
>> >
>> rel#90:JdbcJoin.JDBC.site.[](left=rel#89:Subset#6.JDBC.site.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
>> > $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
>> > rel#86:Subset#7.ENUMERABLE.[], best=rel#101, importance=0.81
>> > rel#98:EnumerableMergeJoin.ENUMERABLE.[[1],
>> >
>> [6]](left=rel#96:Subset#6.ENUMERABLE.[1],right=rel#97:Subset#0.ENUMERABLE.[0],condition==($1,
>> > $6),joinType=inner), rowcount=11250.0, cumulative cost={inf}
>> >
>> rel#101:EnumerableJoin.ENUMERABLE.[](left=rel#99:Subset#6.ENUMERABLE.[],right=rel#100:Subset#0.ENUMERABLE.[],condition==($1,
>> > $6),joinType=inner), rowcount=11250.0, cumulative
>> cost={17956.571923496576
>> > rows, 433.0 cpu, 0.0 io}
>> >
>> rel#109:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#82:Subset#7.JDBC.site.[]),
>> > rowcount=11250.0, cumulative cost={inf}
>>
>>
>

Re: Join not being pushed to JDBC database when also joining across databases

Reply via email to