[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068531#comment-14068531 ]
Thomas Graves commented on SPARK-2604: -------------------------------------- Also note that it shouldn't hang, it should fail after a certain number of retries. The AM retries is configured by the resource manager, the executor failure number is (although this only work in yarn-cluster mode) There is Pr up to fix in client mode if that is what you are using. private val maxNumExecutorFailures = sparkConf.getInt("spark.yarn.max.executor.failures", sparkConf.getInt("spark.yarn.max.worker.failures", math.max(args.numExecutors * 2, 3))) > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > ------------------------------------------------------------------------------------ > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)