This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new e4476c60782 [SPARK-44136][SS] Fixed an issue that StateManager may get
materialized in executor instead of driver in FlatMapGroupsWithStateExec
e4476c60782 is described below
commit e4476c60782b153cc9639497e6653f51c39401b1
Author: bogao007 <[email protected]>
AuthorDate: Thu Jun 22 09:26:32 2023 +0900
[SPARK-44136][SS] Fixed an issue that StateManager may get materialized in
executor instead of driver in FlatMapGroupsWithStateExec
### What changes were proposed in this pull request?
Fixed an issue that StateManager may get materialized in executor instead
of driver in FlatMapGroupsWithStateExec. The ticket that brought this issue:
https://issues.apache.org/jira/browse/SPARK-40411
The basic idea is to maintain the `stateManager` as `lazy val` but
initialize it earlier in the `doExecute()` to force a lazy init at driver.
### Why are the changes needed?
Because without this change, the StateManager in FlatMapGroupsWithStateExec
may get materialized in executor instead of driver which would cause unexpected
behavior.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
It's hard to write a unit test for this since it involves in both driver
and executor which is hard to simulate through a unit test.
Closes #41693 from bogao007/SPARK-44136.
Authored-by: bogao007 <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit fb4d6d9db27d0e9642de33d5b3f9915b334ee02c)
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala | 1 +
1 file changed, 1 insertion(+)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
index 760681e81c9..783226a8060 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
@@ -183,6 +183,7 @@ trait FlatMapGroupsWithStateExecBase
}
override protected def doExecute(): RDD[InternalRow] = {
+ stateManager // force lazy init at driver
metrics // force lazy init at driver
// Throw errors early if parameters are not as expected
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]