Re: Question about parallel query planning

Julian Hyde Thu, 11 Mar 2021 13:11:30 -0800

Are these, by any chance, pair-wise unions that can be flattened to n-way 
unions? That kind of transformation is almost always beneficial.


Julian

> On Mar 11, 2021, at 12:34 AM, JiaTao Tao <[email protected]> wrote:
> 
> Hi Jihoon Son
> I met the same problem(hundreds of union), and my advice is to move some
> rules to hep planner, like sub-query remove, union merge, etc. And this
> works for me.
> 
> Regards!
> 
> Aron Tao
> 
> 
> Julian Hyde <[email protected]> 于2021年3月10日周三 上午2:59写道：
> 
>> At a high level, the Volcano/Cascades planning algorithm is amenable
>> to parallelization. It uses a "work queue" (of matched rules that have
>> not been applied yet) and each task is additive (adds relational
>> expressions to the graph of relational expressions and their
>> equivalence sets, and things are immutable once added to the graph).
>> 
>> The devil will be in the details: making sure that the shared data
>> structures work correctly when other threads are modifying them. For
>> example, what happens when I try to add a RelNode to a set that is
>> currently being merged merged with another set?
>> 
>> Other shared data structures include metadata (aka statistics) and
>> type factories. I think that their APIs are in fairly good shape for
>> making them parallel.
>> 
>> Julian
>> 
>> 
>>> On Tue, Mar 9, 2021 at 10:45 AM Jihoon Son <[email protected]> wrote:
>>> 
>>> Hi Vladimir, thank you for your reply.
>>> 
>>> 5 sec might not be bad from a technical point of view, but our user
>>> wants their queries to finish in 2 - 3 seconds including planning
>>> time. The actual query execution time for this particular query was 2
>>> seconds which can be improved to 20 ms in my testing. However, the
>>> planning time is the bottleneck and thus improving execution time did
>>> not help much in this case.
>>> 
>>>> Did you have a chance to check which exact rules contributed to the
>> planning time? You may inject a listener to VolcanoPlanner to check that.
>>> 
>>> I didn't before, so I just looked at the code to learn how to inject a
>>> listener to VolcanoPlanner. But I'm not sure how I can do it. We are
>>> creating a org.apache.calcite.prepare.PlannerImpl using
>>> org.apache.calcite.tools.Frameworks.getPlanner()
>>> (
>> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java#L89
>> ).
>>> This PlannerImpl has VolcanoPlanner in it, but neither expose it to
>>> outside nor provide an interface for adding a listener. I guess I can
>>> add an interface in PlannerImpl (and Planner) and make a custom build
>>> of Calcite. But I'm wondering if there is a way that I can inject a
>>> listener without making a custom build.
>>> 
>>> Jihoon
>>> 
>>> On Tue, Mar 9, 2021 at 12:03 AM Vladimir Ozerov <[email protected]>
>> wrote:
>>>> 
>>>> *at such = at such scale
>>>> 
>>>> Вт, 9 марта 2021 г. в 11:01, Vladimir Ozerov <[email protected]>:
>>>> 
>>>>> Hi Jihoon,
>>>>> 
>>>>> I would say that 5 sec could be actually a pretty good result at
>> such. Did
>>>>> you have a chance to check which exact rules contributed to the
>> planning
>>>>> time? You may inject a listener to VolcanoPlanner to check that.
>>>>> 
>>>>> Regards,
>>>>> Vladimir
>>>>> 
>>>>> Вт, 9 марта 2021 г. в 05:37, Jihoon Son <[email protected]>:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I posted the same question on the ASF slack channel, but am posting
>>>>>> here as well to get a quicker response.
>>>>>> 
>>>>>> I'm seeing an issue in query planning that it takes a long time (+5
>>>>>> sec) for a giant union query that has 120 subqueries in it. I
>> captured
>>>>>> a flame graph (attached in this email) to see where the bottleneck
>> is,
>>>>>> and based on the flame graph, I believe the query planner spent most
>>>>>> of time to explore the search space of candidate plans to find the
>>>>>> best plan. This seems because of those many subqueries in the same
>>>>>> union query. Is my understanding correct? If so, for this particular
>>>>>> case, it seems possible to parallelize exploring the search space.
>> Do
>>>>>> you have any plan for parallelizing this part? I'm not sure whether
>>>>>> it's already done though in the master branch. I tried to search
>> for a
>>>>>> jira ticket on https://issues.apache.org/jira/browse/CALCITE, but
>>>>>> couldn't find anything with my search skill.
>>>>>> 
>>>>>> Thanks,
>>>>>> Jihoon
>>>>>> 
>>>>> 
>>

Re: Question about parallel query planning

Reply via email to