Re: Question about parallel query planning

JiaTao Tao Thu, 11 Mar 2021 00:34:57 -0800

Hi Jihoon Son
I met the same problem(hundreds of union), and my advice is to move some
rules to hep planner, like sub-query remove, union merge, etc. And this
works for me.


Regards!

Aron Tao


Julian Hyde <[email protected]> 于2021年3月10日周三 上午2:59写道：

> At a high level, the Volcano/Cascades planning algorithm is amenable
> to parallelization. It uses a "work queue" (of matched rules that have
> not been applied yet) and each task is additive (adds relational
> expressions to the graph of relational expressions and their
> equivalence sets, and things are immutable once added to the graph).
>
> The devil will be in the details: making sure that the shared data
> structures work correctly when other threads are modifying them. For
> example, what happens when I try to add a RelNode to a set that is
> currently being merged merged with another set?
>
> Other shared data structures include metadata (aka statistics) and
> type factories. I think that their APIs are in fairly good shape for
> making them parallel.
>
> Julian
>
>
> On Tue, Mar 9, 2021 at 10:45 AM Jihoon Son <[email protected]> wrote:
> >
> > Hi Vladimir, thank you for your reply.
> >
> > 5 sec might not be bad from a technical point of view, but our user
> > wants their queries to finish in 2 - 3 seconds including planning
> > time. The actual query execution time for this particular query was 2
> > seconds which can be improved to 20 ms in my testing. However, the
> > planning time is the bottleneck and thus improving execution time did
> > not help much in this case.
> >
> > > Did you have a chance to check which exact rules contributed to the
> planning time? You may inject a listener to VolcanoPlanner to check that.
> >
> > I didn't before, so I just looked at the code to learn how to inject a
> > listener to VolcanoPlanner. But I'm not sure how I can do it. We are
> > creating a org.apache.calcite.prepare.PlannerImpl using
> > org.apache.calcite.tools.Frameworks.getPlanner()
> > (
> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89
> ).
> > This PlannerImpl has VolcanoPlanner in it, but neither expose it to
> > outside nor provide an interface for adding a listener. I guess I can
> > add an interface in PlannerImpl (and Planner) and make a custom build
> > of Calcite. But I'm wondering if there is a way that I can inject a
> > listener without making a custom build.
> >
> > Jihoon
> >
> > On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]>
> wrote:
> > >
> > > *at such = at such scale
> > >
> > > Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>:
> > >
> > > > Hi Jihoon,
> > > >
> > > > I would say that 5 sec could be actually a pretty good result at
> such. Did
> > > > you have a chance to check which exact rules contributed to the
> planning
> > > > time? You may inject a listener to VolcanoPlanner to check that.
> > > >
> > > > Regards,
> > > > Vladimir
> > > >
> > > > Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I posted the same question on the ASF slack channel, but am posting
> > > >> here as well to get a quicker response.
> > > >>
> > > >> I'm seeing an issue in query planning that it takes a long time (+5
> > > >> sec) for a giant union query that has 120 subqueries in it. I
> captured
> > > >> a flame graph (attached in this email) to see where the bottleneck
> is,
> > > >> and based on the flame graph, I believe the query planner spent most
> > > >> of time to explore the search space of candidate plans to find the
> > > >> best plan. This seems because of those many subqueries in the same
> > > >> union query. Is my understanding correct? If so, for this particular
> > > >> case, it seems possible to parallelize exploring the search space.
> Do
> > > >> you have any plan for parallelizing this part? I'm not sure whether
> > > >> it's already done though in the master branch. I tried to search
> for a
> > > >> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but
> > > >> couldn't find anything with my search skill.
> > > >>
> > > >> Thanks,
> > > >> Jihoon
> > > >>
> > > >
>

Re: Question about parallel query planning

Reply via email to