wuyi created SPARK-34245:
----------------------------
Summary: Master may not remove the finished executor when Worker
fails to send ExecutorStateChanged
Key: SPARK-34245
URL: https://issues.apache.org/jira/browse/SPARK-34245
Project: Spark
Issue Type: Improvement
Components: Deploy, Spark Core
Affects Versions: 3.0.1, 2.4.7, 3.2.0, 3.1.1
Reporter: wuyi
If the Worker fails to send ExecutorStateChanged to the Master due to some
errors, e.g., temporary network error, then the Master can't remove the
finished executor normally and think the executor is still alive. In the worst
case, if the executor is the only one executor for the application, the
application can get hang.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]