[
https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569363#comment-16569363
]
Mridul Muralidharan commented on SPARK-24375:
---------------------------------------------
{quote} We've thought hard on the issue and don't feel we can make it unless we
force users to explicitly set a number in a barrier() call (actually it's not a
good idea because it brings more borden to manage the code).{quote}
I am not sure where the additional burden exists.
Make it an optional param to barrier.
* If not defined, it would be analogous to what exists right now.
* If specified, fail the stage if different tasks in stage end up waiting on
different barrier names (or some have a name and others dont).
In example usecases I have seen, there is usually partition specific code paths
(if partition 0, do some initialization/teardown, etc) - which results in
divergent codepaths : and so increases potential for this issue.
It will be very difficult to reason about the state what happens.
> Design sketch: support barrier scheduling in Apache Spark
> ---------------------------------------------------------
>
> Key: SPARK-24375
> URL: https://issues.apache.org/jira/browse/SPARK-24375
> Project: Spark
> Issue Type: Story
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Xiangrui Meng
> Assignee: Jiang Xingbo
> Priority: Major
>
> This task is to outline a design sketch for the barrier scheduling SPIP
> discussion. It doesn't need to be a complete design before the vote. But it
> should at least cover both Scala/Java and PySpark.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]