[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068531#comment-14068531
 ] 

Thomas Graves commented on SPARK-2604:
--------------------------------------

Also note that it shouldn't hang, it should fail after a certain number of 
retries. The AM retries is configured by the resource manager, the executor 
failure number is (although this only work in yarn-cluster mode) There is Pr up 
to fix in client mode if that is what you are using.

private val maxNumExecutorFailures = 
sparkConf.getInt("spark.yarn.max.executor.failures",
    sparkConf.getInt("spark.yarn.max.worker.failures", 
math.max(args.numExecutors * 2, 3)))


> Spark Application hangs on yarn in edge case scenario of executor memory 
> requirement
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-2604
>                 URL: https://issues.apache.org/jira/browse/SPARK-2604
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Twinkle Sachdeva
>
> In yarn environment, let's say :
> MaxAM = Maximum allocatable memory
> ExecMem - Executor's memory
> if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m ))
>   then Maximum resource validation fails w.r.t executor memory , and 
> application master gets launched, but when resource is allocated and again 
> validated, they are returned and application appears to be hanged.
> Typical use case is to ask for executor memory = maximum allowed memory as 
> per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to