Are these, by any chance, pair-wise unions that can be flattened to n-way unions? That kind of transformation is almost always beneficial.
Julian > On Mar 11, 2021, at 12:34 AM, JiaTao Tao <[email protected]> wrote: > > Hi Jihoon Son > I met the same problem(hundreds of union), and my advice is to move some > rules to hep planner, like sub-query remove, union merge, etc. And this > works for me. > > Regards! > > Aron Tao > > > Julian Hyde <[email protected]> 于2021年3月10日周三 上午2:59写道: > >> At a high level, the Volcano/Cascades planning algorithm is amenable >> to parallelization. It uses a "work queue" (of matched rules that have >> not been applied yet) and each task is additive (adds relational >> expressions to the graph of relational expressions and their >> equivalence sets, and things are immutable once added to the graph). >> >> The devil will be in the details: making sure that the shared data >> structures work correctly when other threads are modifying them. For >> example, what happens when I try to add a RelNode to a set that is >> currently being merged merged with another set? >> >> Other shared data structures include metadata (aka statistics) and >> type factories. I think that their APIs are in fairly good shape for >> making them parallel. >> >> Julian >> >> >>> On Tue, Mar 9, 2021 at 10:45 AM Jihoon Son <[email protected]> wrote: >>> >>> Hi Vladimir, thank you for your reply. >>> >>> 5 sec might not be bad from a technical point of view, but our user >>> wants their queries to finish in 2 - 3 seconds including planning >>> time. The actual query execution time for this particular query was 2 >>> seconds which can be improved to 20 ms in my testing. However, the >>> planning time is the bottleneck and thus improving execution time did >>> not help much in this case. >>> >>>> Did you have a chance to check which exact rules contributed to the >> planning time? You may inject a listener to VolcanoPlanner to check that. >>> >>> I didn't before, so I just looked at the code to learn how to inject a >>> listener to VolcanoPlanner. But I'm not sure how I can do it. We are >>> creating a org.apache.calcite.prepare.PlannerImpl using >>> org.apache.calcite.tools.Frameworks.getPlanner() >>> ( >> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89 >> ). >>> This PlannerImpl has VolcanoPlanner in it, but neither expose it to >>> outside nor provide an interface for adding a listener. I guess I can >>> add an interface in PlannerImpl (and Planner) and make a custom build >>> of Calcite. But I'm wondering if there is a way that I can inject a >>> listener without making a custom build. >>> >>> Jihoon >>> >>> On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]> >> wrote: >>>> >>>> *at such = at such scale >>>> >>>> Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>: >>>> >>>>> Hi Jihoon, >>>>> >>>>> I would say that 5 sec could be actually a pretty good result at >> such. Did >>>>> you have a chance to check which exact rules contributed to the >> planning >>>>> time? You may inject a listener to VolcanoPlanner to check that. >>>>> >>>>> Regards, >>>>> Vladimir >>>>> >>>>> Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I posted the same question on the ASF slack channel, but am posting >>>>>> here as well to get a quicker response. >>>>>> >>>>>> I'm seeing an issue in query planning that it takes a long time (+5 >>>>>> sec) for a giant union query that has 120 subqueries in it. I >> captured >>>>>> a flame graph (attached in this email) to see where the bottleneck >> is, >>>>>> and based on the flame graph, I believe the query planner spent most >>>>>> of time to explore the search space of candidate plans to find the >>>>>> best plan. This seems because of those many subqueries in the same >>>>>> union query. Is my understanding correct? If so, for this particular >>>>>> case, it seems possible to parallelize exploring the search space. >> Do >>>>>> you have any plan for parallelizing this part? I'm not sure whether >>>>>> it's already done though in the master branch. I tried to search >> for a >>>>>> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but >>>>>> couldn't find anything with my search skill. >>>>>> >>>>>> Thanks, >>>>>> Jihoon >>>>>> >>>>> >>
