[
https://issues.apache.org/jira/browse/SPARK-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703147#comment-14703147
]
Abhinav Mishra commented on SPARK-10112:
----------------------------------------
I have put the assertion before the line as well and list1 is just a python
list of numbers and rdd2 is the rdd from that list.
> ValueError: Can only zip with RDD which has the same number of partitions on
> one machine but not on another
> -----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-10112
> URL: https://issues.apache.org/jira/browse/SPARK-10112
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Environment: Ubuntu 14.04.2 LTS
> Reporter: Abhinav Mishra
>
> I have this piece of code which works fine on one machine but when I run this
> on another machine I get error as - "ValueError: Can only zip with RDD which
> has the same number of partitions". My code is:
> rdd2 = sc.parallelize(list1)
> rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
> list = rdd3.collect()
> assert rdd1. getNumPartitions() == rdd2. getNumPartitions()
> My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this
> structure - [1,2,3....]
>
> Both my rdd's - rdd1 and rdd2, have same number of elements and same number
> of partition (both have 1 partition) and I tried to use repartition() as well
> but it does not resolves this issue.
> The above code works fine on one machine but throws error on another. I tired
> to look for some explanations but I couldn't find any specific reason for
> this behavior. I have spark 1.3 on the machine on which it runs without any
> error and spark 1.4 on machine on which this error comes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]