Hi All,

This is a follow up work of [SPARK-24374
<https://issues.apache.org/jira/browse/SPARK-24374>] SPIP: Support Barrier
Execution Mode in Apache Spark.
https://docs.google.com/document/d/1r07-vU5JTH6s1jJ6azkmK0K5it6jwpfO6b_K3mJmxR4/edit?usp=sharing

We need to provide a communication barrier function to help coordinate
tasks within a barrier stage, which is frequently required by ML/DL
workloads. Similar to MPI_Barrier function in MPI, the barrier() function
call blocks until all tasks in the same stage have reached this routine.
The design doc proposes to implement the barrier() function based on the
netty-based RPC framework in Spark, it introduces new driver side
BarrierCoordinator and new BarrierCoordinatorMessage, as well as new config
to handle timeout issue.

Please feel free to review and discuss on the design proposal.

Thanks,
Xingbo

Reply via email to