Jialin LIu created SPARK-26197:
----------------------------------
Summary: Spark master fails to detect driver process pause
Key: SPARK-26197
URL: https://issues.apache.org/jira/browse/SPARK-26197
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.3.2
Reporter: Jialin LIu
I was using Spark 2.3.2 with standalone cluster and submit job using cluster
mode. After I submit the job, I deliberately pause the driver process
(throughout shell command "kill -stop (driver process id) ") to see if the
master can detect this problem. The result shows that the driver will never
stop. All the executors will try to talk back to driver and will give up in 10
minutes. Master can detect executor failures and try to reassign new executor
process to redo the job. New executor will try to create RPC connection with
driver and will fail in 2 minutes. Master will endlessly spawn new executors
without detecting driver failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]