Jiang Xingbo created SPARK-24874:
------------------------------------
Summary: Allow hybrid of both barrier tasks and regular tasks in a
stage
Key: SPARK-24874
URL: https://issues.apache.org/jira/browse/SPARK-24874
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.0
Reporter: Jiang Xingbo
Currently we only allow barrier tasks in a barrier stage, however, consider the
following query:
{code}
sc = new SparkContext(conf)
val rdd1 = sc.parallelize(1 to 100, 10)
val rdd2 = sc.parallelize(1 to 1000, 20).barrier().mapPartitions((it, ctx) =>
it)
val rdd = rdd1.union(rdd2).mapPartitions(t => t)
{code}
Now it requires 30 free slots to run `rdd.collect()`. Actually, we can launch
regular tasks to collect data from rdd1's partitions, they are not required to
be launched together. If we can do that, we only need 20 free slots to run
`rdd.collect()`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]