Hello, At first glance, it could also be possible that the regression is originated in the RelFieldTrimmer. Just a hypothesis.
Best, Ruben On Sat, Mar 12, 2022 at 6:35 AM Haisheng Yuan <[email protected]> wrote: > Hi Zack, > > Looks like it is a regression. > Are you able to provide a reproducible test case? You can log a JIRA along > with the test case, so people can do the root cause analysis. > > Thanks, > Haisheng Yuan > > On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote: > > Hello all! > > > > I would appreciate some help/guidance with troubleshooting a big > performance regression we saw on our UNION queries after updating our > application from 1.24 to 1.29. Many of the tables in the queries are wide > (> 50 columns) so we use a custom push-down Project class. These custom > Projects are being eliminated in 1.29 UNION queries--though not when > replacing UNION with UNION ALL. > > > > The examples below illustrate minimal queries that reproduce the issue > our test database. > > > > When compared to the UNION query times using 1.24, the prepared UNION > statements take 2-2.5 times longer in 1.29. The prepare time is around 2 > times longer as well. > > > > In the examples below, the UNION and UNION ALL queries generate the same > result set in our test database. All Calcite tables have default collations > ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns. > > > > In 1.29 for a simple query, both UNION and UNION ALL produce the same > query plan, with the exception of EnumerableUnion's `all` parameter: > > > > 1.29 > > EXPLAIN PLAN FOR > > SELECT Id FROM t1 > > UNION ALL > > SELECT Id FROM t2 > > > ----------------------------------------------------------------------------------------------------- > > EnumerableUnion(all=[true]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t1]]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t2]]) > > > > 1.29 > > EXPLAIN PLAN FOR > > SELECT Id FROM t1 > > UNION > > SELECT Id FROM t2 > > > ----------------------------------------------------------------------------------------------------- > > EnumerableUnion(all=[false]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t1]]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t2]]) > > > > The next examples add a JOIN to one of the tables in order to make this > a slightly more complicated query and trigger the issue. Unlike in 1.24, > the plans in 1.29 for UNION and UNION ALL now vary significantly: > > > > 1.24 (like the above queries, the UNION and UNION ALL plans differ only > by the `all` parameter, so the UNION ALL plan for 1.24 is omitted) > > EXPLAIN PLAN FOR > > SELECT Id FROM t1 > > UNION > > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > > > ----------------------------------------------------------------------------------------------------- > > EnumerableUnion(all=[false]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t1]]) > > EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1]) > > EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner]) > > EnumerableSort(sort0=[$0], dir0=[ASC]) > > EnumerableCalc(expr#0..262=[{inputs}], > expr#263=[CAST($t178):BIGINT NOT NULL], t3_Id0=[$t263]) > > MyEnumerableConverter > > MyTableScan(table=[[MySchema, t2]]) > > EnumerableSort(sort0=[$0], dir0=[ASC]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t3]]) > > > > 1.29 (note this UNION ALL plan is nearly identical plan to the above > `UNION` plan in 1.24) > > EXPLAIN PLAN FOR > > SELECT Id FROM t1 > > UNION ALL > > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > > > ----------------------------------------------------------------------------------------------------- > > EnumerableUnion(all=[true]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t1]]) > > EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1]) > > EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner]) > > EnumerableSort(sort0=[$0], dir0=[ASC]) > > EnumerableCalc(expr#0..262=[{inputs}], > expr#263=[CAST($t178):BIGINT NOT NULL], t3_Id0=[$t263]) > > MyEnumerableConverter > > MyTableScan(table=[[MySchema, t2]]) > > EnumerableSort(sort0=[$0], dir0=[ASC]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t3]]) > > > > 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and > now the inputs for the post-MergeJoin EnumerableCalc node are very wide > compared to before) > > EXPLAIN PLAN FOR > > SELECT Id FROM t1 > > UNION > > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > > > ----------------------------------------------------------------------------------------------------- > > EnumerableUnion(all=[false]) > > MyEnumerableConverter > > MyProject(Id=[$0]) > > MyTableScan(table=[[MySchema, t1]]) > > EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264]) > > EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner]) > > EnumerableSort(sort0=[$263], dir0=[ASC]) > > EnumerableCalc(expr#0..262=[{inputs}], > expr#263=[CAST($t178):BIGINT NOT NULL], proj#0..263=[{exprs}]) > > MyEnumerableConverter > > MyTableScan(table=[[MySchema, t2]]) > > EnumerableSort(sort0=[$0], dir0=[ASC]) > > MyEnumerableConverter > > MyTableScan(table=[[MySchema, t3]]) > > > > I've tried removing various rules and even restricting them, but the > custom Projects are still removed in the EnumerableMergeJoin despite > setting the self-cost to zero. > > > > Am I missing something? Any suggestions on where to look next? > > > > Thanks for any assistance/tips. > > > > Best, > > Zack > > > > >
