[jira] [Commented] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another

Abhinav Mishra (JIRA) Wed, 19 Aug 2015 08:03:39 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703147#comment-14703147
 ]


Abhinav Mishra commented on SPARK-10112:
----------------------------------------

I have put the assertion before the line as well and list1 is just a python 
list of numbers and rdd2 is the rdd from that list.

> ValueError: Can only zip with RDD which has the same number of partitions on 
> one machine but not on another
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10112
>                 URL: https://issues.apache.org/jira/browse/SPARK-10112
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: Ubuntu 14.04.2 LTS
>            Reporter: Abhinav Mishra
>
> I have this piece of code which works fine on one machine but when I run this 
> on another machine I get error as - "ValueError: Can only zip with RDD which 
> has the same number of partitions". My code is:
> rdd2 = sc.parallelize(list1) 
> rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
> list = rdd3.collect()
> assert rdd1. getNumPartitions() == rdd2. getNumPartitions()
> My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this 
> structure - [1,2,3....]
>  
> Both my rdd's - rdd1 and rdd2, have same number of elements and same number 
> of partition (both have 1 partition) and I tried to use repartition() as well 
> but it does not resolves this issue.
> The above code works fine on one machine but throws error on another. I tired 
> to look for some explanations but I couldn't find any specific reason for 
> this behavior. I have spark 1.3 on the machine on which it runs without any 
> error and spark 1.4 on machine on which this error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another

Reply via email to