[ https://issues.apache.org/jira/browse/SPARK-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5558. ------------------------------ Resolution: Fixed Fix Version/s: 1.3.0 Could be related to how https://issues.apache.org/jira/browse/SPARK-5351 was fixed. OK, let's close for now if it seems to be verified as fixed for 1.3. > pySpark zip function unexpected errors > -------------------------------------- > > Key: SPARK-5558 > URL: https://issues.apache.org/jira/browse/SPARK-5558 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Reporter: Charles Hayden > Labels: pyspark > Fix For: 1.3.0 > > > Example: > {quote} > x = sc.parallelize(range(0,5)) > y = x.map(lambda x: x+1000, preservesPartitioning=True) > y.take(10) > x.zip\(y).collect() > {quote} > Fails in the JVM: py4J: org.apache.spark.SparkException: > Can only zip RDDs with same number of elements in each partition > If the range is changed to range(0,1000) it fails in pySpark code: > ValueError: Can not deserialize RDD with different number of items in pair: > (100, 1) > It also fails if y.take(10) is replaced with y.toDebugString() > It even fails if we print y._jrdd -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org