[
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068531#comment-14068531
]
Thomas Graves commented on SPARK-2604:
--------------------------------------
Also note that it shouldn't hang, it should fail after a certain number of
retries. The AM retries is configured by the resource manager, the executor
failure number is (although this only work in yarn-cluster mode) There is Pr up
to fix in client mode if that is what you are using.
private val maxNumExecutorFailures =
sparkConf.getInt("spark.yarn.max.executor.failures",
sparkConf.getInt("spark.yarn.max.worker.failures",
math.max(args.numExecutors * 2, 3)))
> Spark Application hangs on yarn in edge case scenario of executor memory
> requirement
> ------------------------------------------------------------------------------------
>
> Key: SPARK-2604
> URL: https://issues.apache.org/jira/browse/SPARK-2604
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: Twinkle Sachdeva
>
> In yarn environment, let's say :
> MaxAM = Maximum allocatable memory
> ExecMem - Executor's memory
> if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m ))
> then Maximum resource validation fails w.r.t executor memory , and
> application master gets launched, but when resource is allocated and again
> validated, they are returned and application appears to be hanged.
> Typical use case is to ask for executor memory = maximum allowed memory as
> per yarn config
--
This message was sent by Atlassian JIRA
(v6.2#6252)