[
https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492745#comment-16492745
]
Wenchen Fan commented on SPARK-24375:
-------------------------------------
For the PySpark side, we don't need to care about the scheduler stuff, because
PySpark driver connects to a JVM driver, and all the schedule stuff is done in
the JVM driver.
For the task barrier, one problem is that, we launch a Python worker per task,
and the Python workers talk to the JVM executor via socket. It's hard to change
the protocol and allow the Python worker to send a signal to the JVM executor
to request a sync. We can set up a PY4J server per task, and the Python Worker
can send the barrier sync request via PY4J.
> Design sketch: support barrier scheduling in Apache Spark
> ---------------------------------------------------------
>
> Key: SPARK-24375
> URL: https://issues.apache.org/jira/browse/SPARK-24375
> Project: Spark
> Issue Type: Story
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Xiangrui Meng
> Assignee: Jiang Xingbo
> Priority: Major
>
> This task is to outline a design sketch for the barrier scheduling SPIP
> discussion. It doesn't need to be a complete design before the vote. But it
> should at least cover both Scala/Java and PySpark.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]