Re: Question about parallel query planning

Julian Hyde Tue, 09 Mar 2021 10:59:57 -0800

At a high level, the Volcano/Cascades planning algorithm is amenable
to parallelization. It uses a "work queue" (of matched rules that have
not been applied yet) and each task is additive (adds relational
expressions to the graph of relational expressions and their
equivalence sets, and things are immutable once added to the graph).


The devil will be in the details: making sure that the shared data
structures work correctly when other threads are modifying them. For
example, what happens when I try to add a RelNode to a set that is
currently being merged merged with another set?

Other shared data structures include metadata (aka statistics) and
type factories. I think that their APIs are in fairly good shape for
making them parallel.

Julian


On Tue, Mar 9, 2021 at 10:45 AM Jihoon Son <[email protected]> wrote:
>
> Hi Vladimir, thank you for your reply.
>
> 5 sec might not be bad from a technical point of view, but our user
> wants their queries to finish in 2 - 3 seconds including planning
> time. The actual query execution time for this particular query was 2
> seconds which can be improved to 20 ms in my testing. However, the
> planning time is the bottleneck and thus improving execution time did
> not help much in this case.
>
> > Did you have a chance to check which exact rules contributed to the 
> > planning time? You may inject a listener to VolcanoPlanner to check that.
>
> I didn't before, so I just looked at the code to learn how to inject a
> listener to VolcanoPlanner. But I'm not sure how I can do it. We are
> creating a org.apache.calcite.prepare.PlannerImpl using
> org.apache.calcite.tools.Frameworks.getPlanner()
> (https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89).
> This PlannerImpl has VolcanoPlanner in it, but neither expose it to
> outside nor provide an interface for adding a listener. I guess I can
> add an interface in PlannerImpl (and Planner) and make a custom build
> of Calcite. But I'm wondering if there is a way that I can inject a
> listener without making a custom build.
>
> Jihoon
>
> On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]> wrote:
> >
> > *at such = at such scale
> >
> > Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>:
> >
> > > Hi Jihoon,
> > >
> > > I would say that 5 sec could be actually a pretty good result at such. Did
> > > you have a chance to check which exact rules contributed to the planning
> > > time? You may inject a listener to VolcanoPlanner to check that.
> > >
> > > Regards,
> > > Vladimir
> > >
> > > Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>:
> > >
> > >> Hi all,
> > >>
> > >> I posted the same question on the ASF slack channel, but am posting
> > >> here as well to get a quicker response.
> > >>
> > >> I'm seeing an issue in query planning that it takes a long time (+5
> > >> sec) for a giant union query that has 120 subqueries in it. I captured
> > >> a flame graph (attached in this email) to see where the bottleneck is,
> > >> and based on the flame graph, I believe the query planner spent most
> > >> of time to explore the search space of candidate plans to find the
> > >> best plan. This seems because of those many subqueries in the same
> > >> union query. Is my understanding correct? If so, for this particular
> > >> case, it seems possible to parallelize exploring the search space. Do
> > >> you have any plan for parallelizing this part? I'm not sure whether
> > >> it's already done though in the master branch. I tried to search for a
> > >> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but
> > >> couldn't find anything with my search skill.
> > >>
> > >> Thanks,
> > >> Jihoon
> > >>
> > >

Re: Question about parallel query planning

Reply via email to