You can move the union merge rule to hep, this rule has the benefit.
Regards! Aron Tao Jihoon Son <[email protected]> 于2021年3月12日周五 上午6:36写道: > Julian, thank you for the pointers. I will look at more closely how we > can make the shared data structures thread-safe. > > > Are these, by any chance, pair-wise unions that can be flattened to > n-way unions? That kind of transformation is almost always beneficial. > > Can you elaborate more on this idea? What do you mean by n-way unions? > This particular query I'm looking at is a flat union query that has > 121 simple scan queries. All these subqueries have a pattern of > "SELECT 'string_literal' FROM table WHERE filter LIMIT 1". I think > this union query can be rewritten to a similar query to avoid using > UNIONs at all, such as a scan query using CASE WHEN or an aggregate > query using aggregate functions with FILTER. However, execution time > of the rewritten query would be slower than the original union query > as the LIMIT clause cannot be pushed down to scan. > > > my advice is to move some rules to hep planner, like sub-query remove, > union merge, etc. > > Aron, thank you for the tip. The subquery remove rule is already > processed by HepPlanner. For the union merge rule, we disabled it > because of the performance issue with unions. Do you have any other > suggestions for what rules to move? > > Thanks, > Jihoon > > On Thu, Mar 11, 2021 at 1:11 PM Julian Hyde <[email protected]> > wrote: > > > > Are these, by any chance, pair-wise unions that can be flattened to > n-way unions? That kind of transformation is almost always beneficial. > > > > Julian > > > > > On Mar 11, 2021, at 12:34 AM, JiaTao Tao <[email protected]> wrote: > > > > > > Hi Jihoon Son > > > I met the same problem(hundreds of union), and my advice is to move > some > > > rules to hep planner, like sub-query remove, union merge, etc. And this > > > works for me. > > > > > > Regards! > > > > > > Aron Tao > > > > > > > > > Julian Hyde <[email protected]> 于2021年3月10日周三 上午2:59写道: > > > > > >> At a high level, the Volcano/Cascades planning algorithm is amenable > > >> to parallelization. It uses a "work queue" (of matched rules that have > > >> not been applied yet) and each task is additive (adds relational > > >> expressions to the graph of relational expressions and their > > >> equivalence sets, and things are immutable once added to the graph). > > >> > > >> The devil will be in the details: making sure that the shared data > > >> structures work correctly when other threads are modifying them. For > > >> example, what happens when I try to add a RelNode to a set that is > > >> currently being merged merged with another set? > > >> > > >> Other shared data structures include metadata (aka statistics) and > > >> type factories. I think that their APIs are in fairly good shape for > > >> making them parallel. > > >> > > >> Julian > > >> > > >> > > >>> On Tue, Mar 9, 2021 at 10:45 AM Jihoon Son <[email protected]> > wrote: > > >>> > > >>> Hi Vladimir, thank you for your reply. > > >>> > > >>> 5 sec might not be bad from a technical point of view, but our user > > >>> wants their queries to finish in 2 - 3 seconds including planning > > >>> time. The actual query execution time for this particular query was 2 > > >>> seconds which can be improved to 20 ms in my testing. However, the > > >>> planning time is the bottleneck and thus improving execution time did > > >>> not help much in this case. > > >>> > > >>>> Did you have a chance to check which exact rules contributed to the > > >> planning time? You may inject a listener to VolcanoPlanner to check > that. > > >>> > > >>> I didn't before, so I just looked at the code to learn how to inject > a > > >>> listener to VolcanoPlanner. But I'm not sure how I can do it. We are > > >>> creating a org.apache.calcite.prepare.PlannerImpl using > > >>> org.apache.calcite.tools.Frameworks.getPlanner() > > >>> ( > > >> > https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89 > > >> ). > > >>> This PlannerImpl has VolcanoPlanner in it, but neither expose it to > > >>> outside nor provide an interface for adding a listener. I guess I can > > >>> add an interface in PlannerImpl (and Planner) and make a custom build > > >>> of Calcite. But I'm wondering if there is a way that I can inject a > > >>> listener without making a custom build. > > >>> > > >>> Jihoon > > >>> > > >>> On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]> > > >> wrote: > > >>>> > > >>>> *at such = at such scale > > >>>> > > >>>> Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>: > > >>>> > > >>>>> Hi Jihoon, > > >>>>> > > >>>>> I would say that 5 sec could be actually a pretty good result at > > >> such. Did > > >>>>> you have a chance to check which exact rules contributed to the > > >> planning > > >>>>> time? You may inject a listener to VolcanoPlanner to check that. > > >>>>> > > >>>>> Regards, > > >>>>> Vladimir > > >>>>> > > >>>>> Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>: > > >>>>> > > >>>>>> Hi all, > > >>>>>> > > >>>>>> I posted the same question on the ASF slack channel, but am > posting > > >>>>>> here as well to get a quicker response. > > >>>>>> > > >>>>>> I'm seeing an issue in query planning that it takes a long time > (+5 > > >>>>>> sec) for a giant union query that has 120 subqueries in it. I > > >> captured > > >>>>>> a flame graph (attached in this email) to see where the bottleneck > > >> is, > > >>>>>> and based on the flame graph, I believe the query planner spent > most > > >>>>>> of time to explore the search space of candidate plans to find the > > >>>>>> best plan. This seems because of those many subqueries in the same > > >>>>>> union query. Is my understanding correct? If so, for this > particular > > >>>>>> case, it seems possible to parallelize exploring the search space. > > >> Do > > >>>>>> you have any plan for parallelizing this part? I'm not sure > whether > > >>>>>> it's already done though in the master branch. I tried to search > > >> for a > > >>>>>> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but > > >>>>>> couldn't find anything with my search skill. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Jihoon > > >>>>>> > > >>>>> > > >> >
