[
https://issues.apache.org/jira/browse/SPARK-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313434#comment-14313434
]
Charles Hayden edited comment on SPARK-5558 at 2/10/15 2:59 AM:
----------------------------------------------------------------
This seems to be working as expected in 1.3 branch and in master.
was (Author: cchayden):
This seems to be working as expected in 1.3 branch and in main.
> pySpark zip function unexpected errors
> --------------------------------------
>
> Key: SPARK-5558
> URL: https://issues.apache.org/jira/browse/SPARK-5558
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.2.0
> Reporter: Charles Hayden
> Labels: pyspark
>
> Example:
> {quote}
> x = sc.parallelize(range(0,5))
> y = x.map(lambda x: x+1000, preservesPartitioning=True)
> y.take(10)
> x.zip\(y).collect()
> {quote}
> Fails in the JVM: py4J: org.apache.spark.SparkException:
> Can only zip RDDs with same number of elements in each partition
> If the range is changed to range(0,1000) it fails in pySpark code:
> ValueError: Can not deserialize RDD with different number of items in pair:
> (100, 1)
> It also fails if y.take(10) is replaced with y.toDebugString()
> It even fails if we print y._jrdd
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]