Tian Gao created SPARK-54711:
--------------------------------

             Summary: Restart daemon if the executor can't build a connection 
with the worker
                 Key: SPARK-54711
                 URL: https://issues.apache.org/jira/browse/SPARK-54711
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.2.0
            Reporter: Tian Gao


When the executor asks the daemon to spawn a new worker, the executor tries to 
establish a connection without any timeout mechanism. If the worker has some 
issue and hang before it can establish a connection with the executor, the 
executor will hang there forever.

We have a timeout mechanism for simple workers, and we should have a similar 
mechanism for daemon based workers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to