[ https://issues.apache.org/jira/browse/SPARK-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xukun updated SPARK-6924: ------------------------- Description: In yarn-client mode, client is deployed out side of cluster. When the net between client and cluster is broken, driver lost all executors. In normal situation, client returns and app fails. Actually, the driver hangs, user do not know whether app is ok. So we should let driver return not hang. The solution: in HeartbeatReceiver thread, check whether some executor send heartbeat to dirver at the fixed rate. If no execuor send heartbeats to driver, close SparkContext. was: In yarn-client mode, client is deployed out side of cluster. When the net between client and cluster is broken, driver will be hanged. In this situation, user do not know whether app is ok. So, when no execuor send heartbeats to driver, We should close sparkContext. Let user know app fails, not only hangs driver > driver hangs when net is broken > ------------------------------- > > Key: SPARK-6924 > URL: https://issues.apache.org/jira/browse/SPARK-6924 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: xukun > > In yarn-client mode, client is deployed out side of cluster. When the net > between client and cluster is broken, driver lost all executors. In normal > situation, client returns and app fails. Actually, the driver hangs, user do > not know whether app is ok. So we should let driver return not hang. > The solution: in HeartbeatReceiver thread, check whether some executor send > heartbeat to dirver at the fixed rate. If no execuor send heartbeats to > driver, close SparkContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org