This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 1012967 [SPARK-35767][SQL] Avoid executing child plan twice in
CoalesceExec
1012967 is described below
commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292
Author: Andy Grove <[email protected]>
AuthorDate: Tue Jun 15 11:59:21 2021 -0700
[SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec
### What changes were proposed in this pull request?
`CoalesceExec` needlessly calls `child.execute` twice when it could just
call it once and re-use the results. This only happens when `numPartitions ==
1`.
### Why are the changes needed?
It is more efficient to execute the child plan once rather than twice.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
There are no functional changes. This is just a performance optimization,
so the existing tests should be sufficient to catch any regressions.
Closes #32920 from andygrove/coalesce-exec-executes-twice.
Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index b537040..8c51cde 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -724,12 +724,13 @@ case class CoalesceExec(numPartitions: Int, child:
SparkPlan) extends UnaryExecN
}
protected override def doExecute(): RDD[InternalRow] = {
- if (numPartitions == 1 && child.execute().getNumPartitions < 1) {
+ val rdd = child.execute()
+ if (numPartitions == 1 && rdd.getNumPartitions < 1) {
// Make sure we don't output an RDD with 0 partitions, when claiming
that we have a
// `SinglePartition`.
new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions)
} else {
- child.execute().coalesce(numPartitions, shuffle = false)
+ rdd.coalesce(numPartitions, shuffle = false)
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]