[
https://issues.apache.org/jira/browse/CALCITE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis resolved CALCITE-4494.
------------------------------------------
Resolution: Fixed
Fixed in
[78cc3e3f739704ec4fa36bee9b465b61a672f873|https://github.com/apache/calcite/commit/78cc3e3f739704ec4fa36bee9b465b61a672f873].
Thanks for making the planner faster [~aigor]!
> Improve planning performance with RelSubset check for Rel presence
> -------------------------------------------------------------------
>
> Key: CALCITE-4494
> URL: https://issues.apache.org/jira/browse/CALCITE-4494
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.26.0
> Environment: All environments
> Reporter: Igor Lozynskyi
> Assignee: Igor Lozynskyi
> Priority: Major
> Labels: performance, pull-request-available
> Fix For: 1.27.0
>
> Attachments: CalcitePerf_Planning_RelList_consumes_a_lot.png,
> CalcitePerf_Planning_TPCH_Q7_RelList_consumes_a_lot.png,
> CalcitePerf_Planning_after_improvements.png
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> *Problem*
> Currently, the planning process shows a performance degradation when
> comparing to version 1.25. Worse palling time seems to affect most queries,
> but it is especially clear for queries with many Rel nodes (especially with
> multiple joins).
> In a downstream project, we have a stress test that checks the planning time.
> In some cases, the planning time is increased by x4 (for a query with 28
> joins).
> The main contributing factor (but not the only one) for the slow-down could
> be traced to [https://github.com/apache/calcite/pull/2222/files].
> *Potential Solution*
> As it was mentioned by the reviewers, we may improve the current situation
> with some tiny changes:
> * Introduce a method to check that a Rel node belongs to the RelSubset
> instead of getting all Rel nodes (the current code may take up to 60% of the
> planning time).
> * Improve the null check in RelMdPredicates by building an error message in
> RelMdPredicates.ExprsItr only when it is required (may additionally take 10%
> of the planning time due to SortedMap.toString() being expensive when
> frequently called).
> With these 2 changes, I was able to regain most of the lost planning
> performance.
> The following flame graph clearly shows that the call to
> RelSubset.getRelList() from VolcanoRuleCall.onMatch() is expensive (28 join
> query):
> !CalcitePerf_Planning_RelList_consumes_a_lot.png|width=865,height=365!
> After the proposed improvements, the flame graph shows the following (28 join
> query):
> !CalcitePerf_Planning_after_improvements.png|width=561,height=472!
> It is clear that the HintStrategyTable.isRuleExcluded() call is expensive,
> but the overall picture is much better.
> Also, in my environment, the TPC-H Q7 test takes ~20% less time (39.6 sec vs
> 32.9 sec) after the proposed improvements. Here, the flame graph also shows
> that ordinary queries are also affected by the redundant
> RelSubset.getRelList() calls:
> !CalcitePerf_Planning_TPCH_Q7_RelList_consumes_a_lot.png|width=856,height=573!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)