Hi Zack,
Looks like it is a regression.
Are you able to provide a reproducible test case? You can log a JIRA along with
the test case, so people can do the root cause analysis.
Thanks,
Haisheng Yuan
On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote:
> Hello all!
>
> I would appreciate some help/guidance with troubleshooting a big performance
> regression we saw on our UNION queries after updating our application from
> 1.24 to 1.29. Many of the tables in the queries are wide (> 50 columns) so we
> use a custom push-down Project class. These custom Projects are being
> eliminated in 1.29 UNION queries--though not when replacing UNION with UNION
> ALL.
>
> The examples below illustrate minimal queries that reproduce the issue our
> test database.
>
> When compared to the UNION query times using 1.24, the prepared UNION
> statements take 2-2.5 times longer in 1.29. The prepare time is around 2
> times longer as well.
>
> In the examples below, the UNION and UNION ALL queries generate the same
> result set in our test database. All Calcite tables have default collations
> ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns.
>
> In 1.29 for a simple query, both UNION and UNION ALL produce the same query
> plan, with the exception of EnumerableUnion's `all` parameter:
>
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT Id FROM t2
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[true])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t1]])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t2]])
>
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT Id FROM t2
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t1]])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t2]])
>
> The next examples add a JOIN to one of the tables in order to make this a
> slightly more complicated query and trigger the issue. Unlike in 1.24, the
> plans in 1.29 for UNION and UNION ALL now vary significantly:
>
> 1.24 (like the above queries, the UNION and UNION ALL plans differ only by
> the `all` parameter, so the UNION ALL plan for 1.24 is omitted)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t1]])
> EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
> EnumerableSort(sort0=[$0], dir0=[ASC])
> EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT
> NOT NULL], t3_Id0=[$t263])
> MyEnumerableConverter
> MyTableScan(table=[[MySchema, t2]])
> EnumerableSort(sort0=[$0], dir0=[ASC])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t3]])
>
> 1.29 (note this UNION ALL plan is nearly identical plan to the above `UNION`
> plan in 1.24)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[true])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t1]])
> EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
> EnumerableSort(sort0=[$0], dir0=[ASC])
> EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT
> NOT NULL], t3_Id0=[$t263])
> MyEnumerableConverter
> MyTableScan(table=[[MySchema, t2]])
> EnumerableSort(sort0=[$0], dir0=[ASC])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t3]])
>
> 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and now
> the inputs for the post-MergeJoin EnumerableCalc node are very wide compared
> to before)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
> MyEnumerableConverter
> MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t1]])
> EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264])
> EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner])
> EnumerableSort(sort0=[$263], dir0=[ASC])
> EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT
> NOT NULL], proj#0..263=[{exprs}])
> MyEnumerableConverter
> MyTableScan(table=[[MySchema, t2]])
> EnumerableSort(sort0=[$0], dir0=[ASC])
> MyEnumerableConverter
> MyTableScan(table=[[MySchema, t3]])
>
> I've tried removing various rules and even restricting them, but the custom
> Projects are still removed in the EnumerableMergeJoin despite setting the
> self-cost to zero.
>
> Am I missing something? Any suggestions on where to look next?
>
> Thanks for any assistance/tips.
>
> Best,
> Zack
>
>