Hello all!
I would appreciate some help/guidance with troubleshooting a big performance
regression we saw on our UNION queries after updating our application from 1.24
to 1.29. Many of the tables in the queries are wide (> 50 columns) so we use a
custom push-down Project class. These custom Projects are being eliminated in
1.29 UNION queries--though not when replacing UNION with UNION ALL.
The examples below illustrate minimal queries that reproduce the issue our test
database.
When compared to the UNION query times using 1.24, the prepared UNION
statements take 2-2.5 times longer in 1.29. The prepare time is around 2 times
longer as well.
In the examples below, the UNION and UNION ALL queries generate the same result
set in our test database. All Calcite tables have default collations
("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns.
In 1.29 for a simple query, both UNION and UNION ALL produce the same query
plan, with the exception of EnumerableUnion's `all` parameter:
1.29
EXPLAIN PLAN FOR
SELECT Id FROM t1
UNION ALL
SELECT Id FROM t2
-----------------------------------------------------------------------------------------------------
EnumerableUnion(all=[true])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t1]])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t2]])
1.29
EXPLAIN PLAN FOR
SELECT Id FROM t1
UNION
SELECT Id FROM t2
-----------------------------------------------------------------------------------------------------
EnumerableUnion(all=[false])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t1]])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t2]])
The next examples add a JOIN to one of the tables in order to make this a
slightly more complicated query and trigger the issue. Unlike in 1.24, the
plans in 1.29 for UNION and UNION ALL now vary significantly:
1.24 (like the above queries, the UNION and UNION ALL plans differ only by the
`all` parameter, so the UNION ALL plan for 1.24 is omitted)
EXPLAIN PLAN FOR
SELECT Id FROM t1
UNION
SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
-----------------------------------------------------------------------------------------------------
EnumerableUnion(all=[false])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t1]])
EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
EnumerableSort(sort0=[$0], dir0=[ASC])
EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT NOT
NULL], t3_Id0=[$t263])
MyEnumerableConverter
MyTableScan(table=[[MySchema, t2]])
EnumerableSort(sort0=[$0], dir0=[ASC])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t3]])
1.29 (note this UNION ALL plan is nearly identical plan to the above `UNION`
plan in 1.24)
EXPLAIN PLAN FOR
SELECT Id FROM t1
UNION ALL
SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
-----------------------------------------------------------------------------------------------------
EnumerableUnion(all=[true])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t1]])
EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
EnumerableSort(sort0=[$0], dir0=[ASC])
EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT NOT
NULL], t3_Id0=[$t263])
MyEnumerableConverter
MyTableScan(table=[[MySchema, t2]])
EnumerableSort(sort0=[$0], dir0=[ASC])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t3]])
1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and now the
inputs for the post-MergeJoin EnumerableCalc node are very wide compared to
before)
EXPLAIN PLAN FOR
SELECT Id FROM t1
UNION
SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
-----------------------------------------------------------------------------------------------------
EnumerableUnion(all=[false])
MyEnumerableConverter
MyProject(Id=[$0])
MyTableScan(table=[[MySchema, t1]])
EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264])
EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner])
EnumerableSort(sort0=[$263], dir0=[ASC])
EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT NOT
NULL], proj#0..263=[{exprs}])
MyEnumerableConverter
MyTableScan(table=[[MySchema, t2]])
EnumerableSort(sort0=[$0], dir0=[ASC])
MyEnumerableConverter
MyTableScan(table=[[MySchema, t3]])
I've tried removing various rules and even restricting them, but the custom
Projects are still removed in the EnumerableMergeJoin despite setting the
self-cost to zero.
Am I missing something? Any suggestions on where to look next?
Thanks for any assistance/tips.
Best,
Zack