This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e081f06ea401 [SPARK-47121][CORE] Avoid RejectedExecutionExceptions
during StandaloneSchedulerBackend shutdown
e081f06ea401 is described below
commit e081f06ea401a2b6b8c214a36126583d35eaf55f
Author: Josh Rosen <[email protected]>
AuthorDate: Wed Feb 21 13:35:32 2024 -0800
[SPARK-47121][CORE] Avoid RejectedExecutionExceptions during
StandaloneSchedulerBackend shutdown
### What changes were proposed in this pull request?
This PR adds logic to avoid uncaught `RejectedExecutionException`s while
`StandaloneSchedulerBackend` is shutting down.
When the backend is shut down, its `stop()` method calls
`executorDelayRemoveThread.shutdownNow()`. After this point, though, it's
possible that its `StandaloneDriverEndpoint` might still process
`onDisconnected` events and those might trigger calls to schedule new tasks on
the `executorDelayRemoveThread`. This causes uncaught
`java.util.concurrent.RejectedExecutionException`s to be thrown in RPC threads.
This patch adds a `try-catch` to catch those exceptions and log a short
warning if the exceptions occur while the scheduler is stopping. This approach
is consistent with other similar code in Spark, including:
-
https://github.com/apache/spark/blob/9b53b803998001e4b706666e37e5f86f900a7430/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala#L160-L163
-
https://github.com/apache/spark/blob/9b53b803998001e4b706666e37e5f86f900a7430/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L754-L756
### Why are the changes needed?
Remove log and exception noise.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No new tests: it is difficult to reliably reproduce the scenario that leads
to the log noise.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45203 from JoshRosen/reduce-scheduler-backend-shutdown-noise.
Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../spark/scheduler/cluster/StandaloneSchedulerBackend.scala | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git
a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
index bd1cb164b4be..2150b996f058 100644
---
a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
+++
b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
@@ -18,7 +18,7 @@
package org.apache.spark.scheduler.cluster
import java.util.Locale
-import java.util.concurrent.{Semaphore, TimeUnit}
+import java.util.concurrent.{RejectedExecutionException, Semaphore, TimeUnit}
import java.util.concurrent.atomic.AtomicBoolean
import scala.concurrent.Future
@@ -343,8 +343,14 @@ private[spark] class StandaloneSchedulerBackend(
}
}
}
- executorDelayRemoveThread.schedule(removeExecutorTask,
- _executorRemoveDelay, TimeUnit.MILLISECONDS)
+ try {
+ executorDelayRemoveThread.schedule(removeExecutorTask,
+ _executorRemoveDelay, TimeUnit.MILLISECONDS)
+ } catch {
+ case _: RejectedExecutionException if stopping.get() =>
+ logWarning(
+ "Skipping onDisconnected RemoveExecutor call because the
scheduler is stopping")
+ }
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]