[ 
https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3364.
--------------------------------
    Resolution: Fixed

> Zip equal-length but unequally-partition
> ----------------------------------------
>
>                 Key: SPARK-3364
>                 URL: https://issues.apache.org/jira/browse/SPARK-3364
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>            Reporter: Kevin Jung
>             Fix For: 1.1.0
>
>
> ZippedRDD losts some elements after zipping RDDs with equal numbers of 
> partitions but unequal numbers of elements in their each partitions.
> This can happen when a user creates RDD by sc.textFile(path,partitionNumbers) 
> with physically unbalanced HDFS file.
> {noformat}
> var x = sc.parallelize(1 to 9,3)
> var y = sc.parallelize(Array(1,1,1,1,1,2,2,3,3),3).keyBy(i=>i)
> var z = y.partitionBy(new RangePartitioner(3,y))
> expected
> x.zip(y).count()
> 9
> x.zip(y).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(1,1)), 
> (5,(1,1)), (6,(2,2)), (7,(2,2)), (8,(3,3)), (9,(3,3)))
> unexpected
> x.zip(z).count()
> 7
> x.zip(z).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(2,2)), 
> (5,(2,2)), (7,(3,3)), (8,(3,3)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to