[
https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579930#comment-16579930
]
Jiang Xingbo commented on SPARK-24941:
--------------------------------------
Shall we add something like `spark.default.parallelism`? It maybe not like a
fixed number but be a fraction to say that any barrier stage shall launch tasks
less than the fraction * totalCores ?
> Add RDDBarrier.coalesce() function
> ----------------------------------
>
> Key: SPARK-24941
> URL: https://issues.apache.org/jira/browse/SPARK-24941
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: Jiang Xingbo
> Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r204917245
> The number of partitions from the input data can be unexpectedly large, eg.
> if you do
> {code}
> sc.textFile(...).barrier().mapPartitions()
> {code}
> The number of input partitions is based on the hdfs input splits. We shall
> provide a way in RDDBarrier to enable users to specify the number of tasks in
> a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int)
> .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]