Re: [Help] UNION queries with one or more SELECTs containing a JOIN are significantly slower after 1.24

Ruben Q L Sat, 12 Mar 2022 04:37:39 -0800

Hello,

At first glance, it could also be possible that the regression is
originated in the RelFieldTrimmer. Just a hypothesis.


Best,
Ruben



On Sat, Mar 12, 2022 at 6:35 AM Haisheng Yuan <[email protected]> wrote:

> Hi Zack,
>
> Looks like it is a regression.
> Are you able to provide a reproducible test case? You can log a JIRA along
> with the test case, so people can do the root cause analysis.
>
> Thanks,
> Haisheng Yuan
>
> On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote:
> > Hello all!
> >
> > I would appreciate some help/guidance with troubleshooting a big
> performance regression we saw on our UNION queries after updating our
> application from 1.24 to 1.29. Many of the tables in the queries are wide
> (> 50 columns) so we use a custom push-down Project class. These custom
> Projects are being eliminated in 1.29 UNION queries--though not when
> replacing UNION with UNION ALL.
> >
> > The examples below illustrate minimal queries that reproduce the issue
> our test database.
> >
> > When compared to the UNION query times using 1.24, the prepared UNION
> statements take 2-2.5 times longer in 1.29. The prepare time is around 2
> times longer as well.
> >
> > In the examples below, the UNION and UNION ALL queries generate the same
> result set in our test database. All Calcite tables have default collations
> ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns.
> >
> > In 1.29 for a simple query, both UNION and UNION ALL produce the same
> query plan, with the exception of EnumerableUnion's `all` parameter:
> >
> > 1.29
> > EXPLAIN PLAN FOR
> > SELECT Id FROM t1
> > UNION ALL
> > SELECT Id FROM t2
> >
> -----------------------------------------------------------------------------------------------------
> > EnumerableUnion(all=[true])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t1]])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t2]])
> >
> > 1.29
> > EXPLAIN PLAN FOR
> > SELECT Id FROM t1
> > UNION
> > SELECT Id FROM t2
> >
> -----------------------------------------------------------------------------------------------------
> > EnumerableUnion(all=[false])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t1]])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t2]])
> >
> > The next examples add a JOIN to one of the tables in order to make this
> a slightly more complicated query and trigger the issue. Unlike in 1.24,
> the plans in 1.29 for UNION and UNION ALL now vary significantly:
> >
> > 1.24 (like the above queries,  the UNION and UNION ALL plans differ only
> by the `all` parameter, so the UNION ALL plan for 1.24 is omitted)
> > EXPLAIN PLAN FOR
> > SELECT Id FROM t1
> > UNION
> > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> >
> -----------------------------------------------------------------------------------------------------
> > EnumerableUnion(all=[false])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t1]])
> >   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> >     EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
> >       EnumerableSort(sort0=[$0], dir0=[ASC])
> >         EnumerableCalc(expr#0..262=[{inputs}],
> expr#263=[CAST($t178):BIGINT NOT NULL], t3_Id0=[$t263])
> >           MyEnumerableConverter
> >             MyTableScan(table=[[MySchema, t2]])
> >       EnumerableSort(sort0=[$0], dir0=[ASC])
> >         MyEnumerableConverter
> >           MyProject(Id=[$0])
> >             MyTableScan(table=[[MySchema, t3]])
> >
> > 1.29 (note this UNION ALL plan is nearly identical plan to the above
> `UNION` plan in 1.24)
> > EXPLAIN PLAN FOR
> > SELECT Id FROM t1
> > UNION ALL
> > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> >
> -----------------------------------------------------------------------------------------------------
> > EnumerableUnion(all=[true])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t1]])
> >   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> >     EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
> >       EnumerableSort(sort0=[$0], dir0=[ASC])
> >         EnumerableCalc(expr#0..262=[{inputs}],
> expr#263=[CAST($t178):BIGINT NOT NULL], t3_Id0=[$t263])
> >           MyEnumerableConverter
> >             MyTableScan(table=[[MySchema, t2]])
> >       EnumerableSort(sort0=[$0], dir0=[ASC])
> >         MyEnumerableConverter
> >           MyProject(Id=[$0])
> >             MyTableScan(table=[[MySchema, t3]])
> >
> > 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and
> now the inputs for the post-MergeJoin EnumerableCalc node are very wide
> compared to before)
> > EXPLAIN PLAN FOR
> > SELECT Id FROM t1
> > UNION
> > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> >
> -----------------------------------------------------------------------------------------------------
> > EnumerableUnion(all=[false])
> >   MyEnumerableConverter
> >     MyProject(Id=[$0])
> >       MyTableScan(table=[[MySchema, t1]])
> >   EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264])
> >     EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner])
> >       EnumerableSort(sort0=[$263], dir0=[ASC])
> >         EnumerableCalc(expr#0..262=[{inputs}],
> expr#263=[CAST($t178):BIGINT NOT NULL], proj#0..263=[{exprs}])
> >           MyEnumerableConverter
> >             MyTableScan(table=[[MySchema, t2]])
> >       EnumerableSort(sort0=[$0], dir0=[ASC])
> >         MyEnumerableConverter
> >           MyTableScan(table=[[MySchema, t3]])
> >
> > I've tried removing various rules and even restricting them, but the
> custom Projects are still removed in the EnumerableMergeJoin despite
> setting the self-cost to zero.
> >
> > Am I missing something? Any suggestions on where to look next?
> >
> > Thanks for any assistance/tips.
> >
> > Best,
> > Zack
> >
> >
>

Re: [Help] UNION queries with one or more SELECTs containing a JOIN are significantly slower after 1.24

Reply via email to