[
https://issues.apache.org/jira/browse/SPARK-48997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim resolved SPARK-48997.
----------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 47475
[https://github.com/apache/spark/pull/47475]
> Maintenance thread pool error should not cause the entire executor to crash
> ---------------------------------------------------------------------------
>
> Key: SPARK-48997
> URL: https://issues.apache.org/jira/browse/SPARK-48997
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 4.0.0
> Reporter: Neil Ramaswamy
> Assignee: Neil Ramaswamy
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Today, it's possible for an exception within a thread in the maintenance pool
> to cause the entire executor to crash. Here's how:
> # An error occurs in a maintenance pool thread
> # It gets passed to the maintenance task thread, which `throw`s it
> # That gets caught by `onError`, which `.stop()`s the maintenance thread pool
> # If any of the maintenance pool threads are waiting on a lock, they will
> receive an `InterruptedException` (this happens if they are verifying if the
> their state store instance is active)
> # This `InterruptedException` is not caught, which is not `NonFatal`
> # This uncaught exception bubbles all the way to the
> `SparkUncaughtExceptionHandler`, causing the executor to exit
> A fix that is better is to modify the maintenance thread pool to only
> `unload` providers that experience errors, not stop the entire thread pool.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]