Imran Rashid created SPARK-17675: ------------------------------------ Summary: Add Blacklisting of Executors & Nodes within one TaskSet Key: SPARK-17675 URL: https://issues.apache.org/jira/browse/SPARK-17675 Project: Spark Issue Type: Task Components: Scheduler Affects Versions: 2.0.0 Reporter: Imran Rashid Assignee: Imran Rashid
This is a step along the way to SPARK-8425 -- see the design doc on that jira for a complete discussion of blacklisting. To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for * (task, executor) pairs (this already exists via an undocumented config) * (task, node) * (taskset, executor) * (taskset, node) In particular, adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org