[
https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498488#comment-16498488
]
Reynold Xin edited comment on SPARK-24374 at 6/1/18 7:57 PM:
-------------------------------------------------------------
That breaks end to end FT right? Also assumes there's always a full feature
fledged CM under the hood, which is increasingly not the case for cloud
deployments.
was (Author: rxin):
That breaks end to end FT right?
> SPIP: Support Barrier Scheduling in Apache Spark
> ------------------------------------------------
>
> Key: SPARK-24374
> URL: https://issues.apache.org/jira/browse/SPARK-24374
> Project: Spark
> Issue Type: Epic
> Components: ML, Spark Core
> Affects Versions: 3.0.0
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
> Priority: Major
> Labels: SPIP
> Attachments: SPIP_ Support Barrier Scheduling in Apache Spark.pdf
>
>
> (See details in the linked/attached SPIP doc.)
> {quote}
> The proposal here is to add a new scheduling model to Apache Spark so users
> can properly embed distributed DL training as a Spark stage to simplify the
> distributed training workflow. For example, Horovod uses MPI to implement
> all-reduce to accelerate distributed TensorFlow training. The computation
> model is different from MapReduce used by Spark. In Spark, a task in a stage
> doesn’t depend on any other tasks in the same stage, and hence it can be
> scheduled independently. In MPI, all workers start at the same time and pass
> messages around. To embed this workload in Spark, we need to introduce a new
> scheduling model, tentatively named “barrier scheduling”, which launches
> tasks at the same time and provides users enough information and tooling to
> embed distributed DL training. Spark can also provide an extra layer of fault
> tolerance in case some tasks failed in the middle, where Spark would abort
> all tasks and restart the stage.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]