[jira] [Commented] (SPARK-24941) Add RDDBarrier.coalesce() function

Jiang Xingbo (JIRA) Tue, 14 Aug 2018 08:05:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579930#comment-16579930
 ]


Jiang Xingbo commented on SPARK-24941:
--------------------------------------

Shall we add something like `spark.default.parallelism`? It maybe not like a 
fixed number but be a fraction to say that any barrier stage shall launch tasks 
less than the fraction * totalCores ?

> Add RDDBarrier.coalesce() function
> ----------------------------------
>
>                 Key: SPARK-24941
>                 URL: https://issues.apache.org/jira/browse/SPARK-24941
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Jiang Xingbo
>            Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r204917245
> The number of partitions from the input data can be unexpectedly large, eg. 
> if you do
> {code}
> sc.textFile(...).barrier().mapPartitions()
> {code}
> The number of input partitions is based on the hdfs input splits. We shall 
> provide a way in RDDBarrier to enable users to specify the number of tasks in 
> a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) 
> .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-24941) Add RDDBarrier.coalesce() function

Reply via email to