[jira] [Updated] (SPARK-4019) Shuffling with more than 2000 reducers may drop all data when partitons are mostly empty or cause deserialization errors if at least one partition is empty

Josh Rosen (JIRA) Thu, 23 Oct 2014 09:51:02 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh Rosen updated SPARK-4019:
------------------------------
    Summary: Shuffling with more than 2000 reducers may drop all data when 
partitons are mostly empty or cause deserialization errors if at least one 
partition is empty  (was: Shuffling with more than 2000 map partitions may drop 
all data when partitions are mostly empty or cause deserialization errors if at 
least one partition is empty)

> Shuffling with more than 2000 reducers may drop all data when partitons are 
> mostly empty or cause deserialization errors if at least one partition is 
> empty
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4019
>                 URL: https://issues.apache.org/jira/browse/SPARK-4019
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Xiangrui Meng
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> {code}
> sc.makeRDD(0 until 10, 1000).repartition(2001).collect()
> {code}
> returns `Array()`.
> 1.1.0 doesn't have this issue. Tried both HASH and SORT manager.
> This problem can also manifest itself as Snappy deserialization errors if the 
> average map output status size is non-zero but there is at least one empty 
> partition, e.g. 
> sc.makeRDD(0 until 100000, 1000).repartition(2001).collect()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4019) Shuffling with more than 2000 reducers may drop all data when partitons are mostly empty or cause deserialization errors if at least one partition is empty

Reply via email to