[ https://issues.apache.org/jira/browse/SPARK-16554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-16554: ------------------------------------ Assignee: Jose Soltren (was: Apache Spark) > Spark should kill executors when they are blacklisted > ----------------------------------------------------- > > Key: SPARK-16554 > URL: https://issues.apache.org/jira/browse/SPARK-16554 > Project: Spark > Issue Type: New Feature > Components: Scheduler > Reporter: Imran Rashid > Assignee: Jose Soltren > > SPARK-8425 will allow blacklisting faulty executors and nodes. However, > these blacklisted executors will continue to run. This is bad for a few > reasons: > (1) Even if there is faulty-hardware, if the cluster is under-utilized spark > may be able to request another executor on a different node. > (2) If there is a faulty-disk (the most common case of faulty-hardware), the > cluster manager may be able to allocate another executor on the same node, if > it can exclude the bad disk. (Yarn will do this with its disk-health > checker.) > With dynamic allocation, this may seem less critical, as a blacklisted > executor will stop running new tasks and eventually get reclaimed. However, > if there is cached data on those executors, they will not get killed till > {{spark.dynamicAllocation.cachedExecutorIdleTimeout}} expires, which is > (effectively) infinite by default. > Users may not *always* want to kill bad executors, so this must be > configurable to some extent. At a minimum, it should be possible to enable / > disable it; perhaps the executor should be killed after it has been > blacklisted a configurable {{N}} times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org