This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new 24f88b319c8 [SPARK-45205][SQL] CommandResultExec to override iterator
methods to avoid triggering multiple jobs
24f88b319c8 is described below
commit 24f88b319c88bfe55e8b2b683193a85842bdad88
Author: yorksity <[email protected]>
AuthorDate: Tue Oct 10 14:36:23 2023 +0800
[SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid
triggering multiple jobs
### What changes were proposed in this pull request?
After SPARK-35378 was changed, the execution of statements such as ‘show
parititions test' became slower. The change point is that the execution process
changes from ExecutedCommandEnec to CommandResultExec, but ExecutedCommandExec
originally implemented the following method
override def executeToIterator(): Iterator[InternalRow] =
sideEffectResult.iterator
CommandResultExec is not rewritten, so when the hasNext method is executed,
a job process is created, resulting in increased time-consuming
### Why are the changes needed?
Improve performance when show partitions/tables.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests should cover this.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #43270 from yorksity/SPARK-45205.
Authored-by: yorksity <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit c9c99222e828d556552694dfb48c75bf0703a2c4)
Signed-off-by: Wenchen Fan <[email protected]>
---
.../main/scala/org/apache/spark/sql/execution/CommandResultExec.scala | 2 ++
1 file changed, 2 insertions(+)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
index 5f38278d2dc..45e3e41ab05 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
@@ -81,6 +81,8 @@ case class CommandResultExec(
unsafeRows
}
+ override def executeToIterator(): Iterator[InternalRow] = unsafeRows.iterator
+
override def executeTake(limit: Int): Array[InternalRow] = {
val taken = unsafeRows.take(limit)
longMetric("numOutputRows").add(taken.size)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]