[
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-34989:
------------------------------------
Assignee: (was: Apache Spark)
> Improve the performance of mapChildren and withNewChildren methods
> ------------------------------------------------------------------
>
> Key: SPARK-34989
> URL: https://issues.apache.org/jira/browse/SPARK-34989
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Ali Afroozeh
> Priority: Major
>
> One of the main performance bottlenecks in query compilation is
> overly-generic tree transformation methods, namely {{mapChildren}} and
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an
> overly-generic implementation to iterate over the children and rely on
> reflection to create new instances. We have observed that, especially for
> queries with large query plans, a significant amount of CPU cycles are wasted
> in these methods. In this PR we make these methods more efficient, by
> delegating the iteration and instantiation to concrete node types. The
> benchmarks show that we can expect significant performance improvement in
> total query compilation time in queries with large query plans (from 30-80%)
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To
> be more specific, this method:
> * iterates over all the fields of a node using Scala’s product iterator.
> While the iteration is not reflection-based, thanks to the Scala compiler
> generating code for {{Product}}, we create many anonymous functions and visit
> many nested structures (recursive calls).
> The anonymous functions (presumably compiled to Java anonymous inner
> classes) also show up quite high on the list in the object allocation
> profiles, so we are putting unnecessary pressure on GC here.
> * does a lot of comparisons. Basically for each element returned from the
> product iterator, we check if it is a child (contained in the list of
> children) and then transform it. We can avoid that by just iterating over
> children, but in the current implementation, we need to gather all the fields
> (only transform the children) so that we can instantiate the object using the
> reflection.
> * creates objects using reflection, by delegating to the {{makeCopy}}
> method, which is several orders of magnitude slower than using the
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the
> {{mapChildren}} method using the {{children}} and {{withNewChildren}}
> methods. The default {{withNewChildren}} method suffers from the same
> problems as {{mapChildren}} and we need to make it more efficient by
> specializing it in concrete classes. Similar to how each concrete query plan
> node already defines its children, it should also define how they can be
> constructed given a new list of children. Actually, the implementation is
> quite simple in most cases and is a one-liner thanks to the copy method
> present in Scala case classes. Note that we cannot abstract over the copy
> method, it’s generated by the compiler for case classes if no other type
> higher in the hierarchy defines it. For most concrete nodes, the
> implementation of {{withNewChildren}} looks like this:
>
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan =
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should
> preserve:
> * It returns the same instance if the provided children are the same as its
> children, i.e., it preserves referential equality.
> * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type
> implementation. Therefore, we propose a template method
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes
> and let the {{withNewChildren}} method take care of referential equality and
> copying:
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan =
> {}}
> {{ if (childrenTheSame(children, newChildren)) {}}
> {{ this}}
> {{ } else {}}
> {{ CurrentOrigin.withOrigin(origin) {}}
> {{ val res = withNewChildrenInternal(newChildren)}}
> {{ res.copyTagsFrom(this)}}
> {{ res}}
> {{ } }}
> {{ } }}
> {{ } }}
> With the refactoring done in a previous PR
> ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types
> fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or
> {{Ternary}}. These traits have a more efficient implementation for
> {{mapChildren}} and define a more specialized version of
> {{withNewChildrenInternal}} that avoids creating unnecessary lists. For
> example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows:
> {{override final def mapChildren(f: T => T): T = {}}
> {{ val newChild = f(child)}}
> {{ if (newChild fastEquals child) {}}
> {{ this.asInstanceOf[T]}}
> {{ } else {}}
> {{ CurrentOrigin.withOrigin(origin) {}}
> {{ val res = withNewChildInternal(newChild)}}
> {{ res.copyTagsFrom(this.asInstanceOf[T])}}
> {{ res}}
> {{ }}}
> {{ }}}
> {{ }}}
> h4. Results
> With this PR, we have observed significant performance improvements in query
> compilation time, more specifically in the analysis and optimization phases.
> The table below shows the TPC-DS queries that had more than 25% speedup in
> compilation times. Biggest speedups are observed in queries with large query
> plans.
> ||Query||Speedup||
> |q4|29%|
> |q9|81%|
> |q14a|31%|
> |q14b|28%|
> |q22|33%|
> |q33|29%|
> |q34|25%|
> |q39|27%|
> |q41|27%|
> |q44|26%|
> |q47|28%|
> |q48|76%|
> |q49|46%|
> |q56|26%|
> |q58|43%|
> |q59|46%|
> |q60|50%|
> |q65|59%|
> |q66|46%|
> |q67|52%|
> |q69|31%|
> |q70|30%|
> |q96|26%|
> |q98|32%|
> h4. Binary incompatibility
> Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break
> the binary compatibility of the code compiled against older versions of
> Spark. This is a problem, for example, when users write custom expressions.
> Making \{{withNewChildren}}abstract is the right choice, since it forces all
> newly added expressions to Catalyst implement it in an efficient manner and
> will prevent future regression.
> Please note that we have not completely removed the old implementation and
> renamed it to {{legacyWithNewChildren}}. This method will be removed in the
> future and for now helps the transition. There are expressions such as
> {{UpdateFields}} that have a complex way of defining children. Writing
> {{withNewChildren}} for them requires refactoring the expression. For now,
> these expressions use the old, slow method. In a future PR we address these
> expressions.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]