Tian Gao created SPARK-54711:
--------------------------------
Summary: Restart daemon if the executor can't build a connection
with the worker
Key: SPARK-54711
URL: https://issues.apache.org/jira/browse/SPARK-54711
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Tian Gao
When the executor asks the daemon to spawn a new worker, the executor tries to
establish a connection without any timeout mechanism. If the worker has some
issue and hang before it can establish a connection with the executor, the
executor will hang there forever.
We have a timeout mechanism for simple workers, and we should have a similar
mechanism for daemon based workers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]