Re: Question about parallel query planning

Jihoon Son Tue, 09 Mar 2021 10:45:15 -0800

Hi Vladimir, thank you for your reply.

5 sec might not be bad from a technical point of view, but our user
wants their queries to finish in 2 - 3 seconds including planning
time. The actual query execution time for this particular query was 2
seconds which can be improved to 20 ms in my testing. However, the
planning time is the bottleneck and thus improving execution time did
not help much in this case.


> Did you have a chance to check which exact rules contributed to the planning 
> time? You may inject a listener to VolcanoPlanner to check that.

I didn't before, so I just looked at the code to learn how to inject a
listener to VolcanoPlanner. But I'm not sure how I can do it. We are
creating a org.apache.calcite.prepare.PlannerImpl using
org.apache.calcite.tools.Frameworks.getPlanner()
(https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89).
This PlannerImpl has VolcanoPlanner in it, but neither expose it to
outside nor provide an interface for adding a listener. I guess I can
add an interface in PlannerImpl (and Planner) and make a custom build
of Calcite. But I'm wondering if there is a way that I can inject a
listener without making a custom build.

Jihoon

On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]> wrote:
>
> *at such = at such scale
>
> Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>:
>
> > Hi Jihoon,
> >
> > I would say that 5 sec could be actually a pretty good result at such. Did
> > you have a chance to check which exact rules contributed to the planning
> > time? You may inject a listener to VolcanoPlanner to check that.
> >
> > Regards,
> > Vladimir
> >
> > Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>:
> >
> >> Hi all,
> >>
> >> I posted the same question on the ASF slack channel, but am posting
> >> here as well to get a quicker response.
> >>
> >> I'm seeing an issue in query planning that it takes a long time (+5
> >> sec) for a giant union query that has 120 subqueries in it. I captured
> >> a flame graph (attached in this email) to see where the bottleneck is,
> >> and based on the flame graph, I believe the query planner spent most
> >> of time to explore the search space of candidate plans to find the
> >> best plan. This seems because of those many subqueries in the same
> >> union query. Is my understanding correct? If so, for this particular
> >> case, it seems possible to parallelize exploring the search space. Do
> >> you have any plan for parallelizing this part? I'm not sure whether
> >> it's already done though in the master branch. I tried to search for a
> >> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but
> >> couldn't find anything with my search skill.
> >>
> >> Thanks,
> >> Jihoon
> >>
> >

Re: Question about parallel query planning

Reply via email to