[ 
https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578430#comment-16578430
 ] 

Imran Rashid commented on SPARK-24941:
--------------------------------------

I don't think the answer here should be having the user specify 
{{numPartitions}}, that seems extremely hard for a user to get right with the 
way the rest of spark works.

Its hard enough to choose the number of partitions normally with spark, but 
users are currently trained to tune to amount of data, and to choose a *large* 
number.  But now that number needs to be really tightly linked to what compute 
resources are available at *runtime*.  Eg. you might run your job with 50 
executors one time, and so you decide to choose 50 tasks.   Then you try your 
job again a little later on, and now you only get 49 executors (cluster busy, 
or maintenance, or node failure, or budget constraints, etc.) and your job 
doesn't get anywhere.

I'd think the user would have some way to let the repartitioning be automatic 
based on available resources.

> Add RDDBarrier.coalesce() function
> ----------------------------------
>
>                 Key: SPARK-24941
>                 URL: https://issues.apache.org/jira/browse/SPARK-24941
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Jiang Xingbo
>            Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r204917245
> The number of partitions from the input data can be unexpectedly large, eg. 
> if you do
> {code}
> sc.textFile(...).barrier().mapPartitions()
> {code}
> The number of input partitions is based on the hdfs input splits. We shall 
> provide a way in RDDBarrier to enable users to specify the number of tasks in 
> a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) 
> .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to