This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 1012967 [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec 1012967 is described below commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292 Author: Andy Grove <andygrov...@gmail.com> AuthorDate: Tue Jun 15 11:59:21 2021 -0700 [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec ### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes #32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove <andygrov...@gmail.com> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index b537040..8c51cde 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -724,12 +724,13 @@ case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecN } protected override def doExecute(): RDD[InternalRow] = { - if (numPartitions == 1 && child.execute().getNumPartitions < 1) { + val rdd = child.execute() + if (numPartitions == 1 && rdd.getNumPartitions < 1) { // Make sure we don't output an RDD with 0 partitions, when claiming that we have a // `SinglePartition`. new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions) } else { - child.execute().coalesce(numPartitions, shuffle = false) + rdd.coalesce(numPartitions, shuffle = false) } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org