[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567358#comment-16567358
 ] 

Erik Erlandson commented on SPARK-24817:
----------------------------------------

I have been looking at the use cases for barrier-mode on the design doc. The 
primary story seems to be along the lines of using {{mapPartitions}} to:
 # write out any partitioned data (and sync)
 # execute some kind of ML logic (TF, etc) (possibly syncing on stages here?)
 # optionally move back into "normal" spark executions

My mental model has been that the value proposition for Hydrogen is primarily a 
convergence argument: it is easier to not have to leave a Spark workflow and 
execute something like TF using some other toolchain. But OTOH, given that the 
Spark programmer has to write out the partitioned data and then invoke ML 
tooling like TF regardless, does the increase to convenience pay for the cost 
in complexity for absorbing new clustering & scheduling models into Spark, 
along with other consequences, for example SPARK-24615, compared to the "null 
hypothesis" of writing partition data, then using ML-specific clustering 
toolchains (kubeflow, for example), and consuming the resulting products in 
Spark afterward.

> Implement BarrierTaskContext.barrier()
> --------------------------------------
>
>                 Key: SPARK-24817
>                 URL: https://issues.apache.org/jira/browse/SPARK-24817
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Jiang Xingbo
>            Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to