Re: [Help] UNION queries with one or more SELECTs containing a JOIN are significantly slower after 1.24

Haisheng Yuan Fri, 11 Mar 2022 22:35:47 -0800

Hi Zack,

Looks like it is a regression.
Are you able to provide a reproducible test case? You can log a JIRA along with 
the test case, so people can do the root cause analysis.


Thanks,
Haisheng Yuan

On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote:
> Hello all!
> 
> I would appreciate some help/guidance with troubleshooting a big performance 
> regression we saw on our UNION queries after updating our application from 
> 1.24 to 1.29. Many of the tables in the queries are wide (> 50 columns) so we 
> use a custom push-down Project class. These custom Projects are being 
> eliminated in 1.29 UNION queries--though not when replacing UNION with UNION 
> ALL.
> 
> The examples below illustrate minimal queries that reproduce the issue our 
> test database.
> 
> When compared to the UNION query times using 1.24, the prepared UNION 
> statements take 2-2.5 times longer in 1.29. The prepare time is around 2 
> times longer as well.
> 
> In the examples below, the UNION and UNION ALL queries generate the same 
> result set in our test database. All Calcite tables have default collations 
> ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns.
> 
> In 1.29 for a simple query, both UNION and UNION ALL produce the same query 
> plan, with the exception of EnumerableUnion's `all` parameter:
> 
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT Id FROM t2
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[true])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t2]])
> 
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT Id FROM t2
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t2]])
> 
> The next examples add a JOIN to one of the tables in order to make this a 
> slightly more complicated query and trigger the issue. Unlike in 1.24, the 
> plans in 1.29 for UNION and UNION ALL now vary significantly:
> 
> 1.24 (like the above queries,  the UNION and UNION ALL plans differ only by 
> the `all` parameter, so the UNION ALL plan for 1.24 is omitted)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
>     EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT 
> NOT NULL], t3_Id0=[$t263])
>           MyEnumerableConverter
>             MyTableScan(table=[[MySchema, t2]])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         MyEnumerableConverter
>           MyProject(Id=[$0])
>             MyTableScan(table=[[MySchema, t3]])
> 
> 1.29 (note this UNION ALL plan is nearly identical plan to the above `UNION` 
> plan in 1.24)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[true])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
>     EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT 
> NOT NULL], t3_Id0=[$t263])
>           MyEnumerableConverter
>             MyTableScan(table=[[MySchema, t2]])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         MyEnumerableConverter
>           MyProject(Id=[$0])
>             MyTableScan(table=[[MySchema, t3]])
> 
> 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and now 
> the inputs for the post-MergeJoin EnumerableCalc node are very wide compared 
> to before)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -----------------------------------------------------------------------------------------------------
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264])
>     EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner])
>       EnumerableSort(sort0=[$263], dir0=[ASC])
>         EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT 
> NOT NULL], proj#0..263=[{exprs}])
>           MyEnumerableConverter
>             MyTableScan(table=[[MySchema, t2]])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         MyEnumerableConverter
>           MyTableScan(table=[[MySchema, t3]])
> 
> I've tried removing various rules and even restricting them, but the custom 
> Projects are still removed in the EnumerableMergeJoin despite setting the 
> self-cost to zero.
> 
> Am I missing something? Any suggestions on where to look next?
> 
> Thanks for any assistance/tips.
> 
> Best,
> Zack
> 
>

Re: [Help] UNION queries with one or more SELECTs containing a JOIN are significantly slower after 1.24

Reply via email to