[ https://issues.apache.org/jira/browse/SPARK-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Hayden updated SPARK-5558: ---------------------------------- Description: Example: {quote} x = sc.parallelize(range(0,5)) y = x.map(lambda x: x+1000, preservesPartitioning=True) y.take(10) x.zip\(y).collect() {quote} Fails in the JVM: py4J: org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition If the range is changed to range(0,1000) it fails in pySpark code: ValueError: Can not deserialize RDD with different number of items in pair: (100, 1) It also fails if y.take(10) is replaced with y.toDebugString() It even fails if we print y._jrdd was: Example: {quote} x = sc.parallelize(range(0,5)) y = x.map(lambda x: x+1000, preservesPartitioning=True) y.take(10) x.zip\(y).collect() {quote} Fails in the JVM: py4J: org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition If the range is changed to range(0,1000) it fails in pySpark code: ValueError: Can not deserialize RDD with different number of items in pair: (100, 1) It also fails if y.take(10) is replaced with y.toDebugString() > pySpark zip function unexpected errors > -------------------------------------- > > Key: SPARK-5558 > URL: https://issues.apache.org/jira/browse/SPARK-5558 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Reporter: Charles Hayden > Labels: pyspark > > Example: > {quote} > x = sc.parallelize(range(0,5)) > y = x.map(lambda x: x+1000, preservesPartitioning=True) > y.take(10) > x.zip\(y).collect() > {quote} > Fails in the JVM: py4J: org.apache.spark.SparkException: > Can only zip RDDs with same number of elements in each partition > If the range is changed to range(0,1000) it fails in pySpark code: > ValueError: Can not deserialize RDD with different number of items in pair: > (100, 1) > It also fails if y.take(10) is replaced with y.toDebugString() > It even fails if we print y._jrdd -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org