[
https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-11120:
------------------------------------
Assignee: (was: Apache Spark)
> maxNumExecutorFailures defaults to 3 under dynamic allocation
> -------------------------------------------------------------
>
> Key: SPARK-11120
> URL: https://issues.apache.org/jira/browse/SPARK-11120
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.5.1
> Reporter: Ryan Williams
> Priority: Minor
>
> With dynamic allocation, the {{spark.executor.instances}} config is 0,
> meaning [this
> line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68]
> ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has
> resulted in large dynamicAllocation jobs with hundreds of executors dying due
> to one bad node serially failing executors that are allocated on it.
> I think that using {{spark.dynamicAllocation.maxExecutors}} would make most
> sense in this case; I frequently run shells that vary between 1 and 1000
> executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would
> still leave me with a value that is lower than makes sense.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]