[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092135#comment-14092135
 ] 

Xiangrui Meng commented on SPARK-2944:
--------------------------------------

Found that this behavior is not deterministic. So it is hard to tell which 
commit introduces it now. It seems that it happens when tasks are very small. 
Some workers may get a lot more assignments than others because they finishes 
the tasks very quickly and TaskSetManager always picks the first available one. 
(There are no randomization in `TaskSetManager`.)

> sc.makeRDD doesn't distribute partitions evenly
> -----------------------------------------------
>
>                 Key: SPARK-2944
>                 URL: https://issues.apache.org/jira/browse/SPARK-2944
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> 16 nodes EC2 cluster:
> {code}
> val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
> rdd.count()
> {code}
> Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to