[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319316#comment-14319316 ]
Brennon York commented on SPARK-5790: ------------------------------------- FWIW this issue is a blocker for [SPARK-4600|https://issues.apache.org/jira/browse/SPARK-4600] that I'm working on as `diff` relies on the use of `zipPartitions` causing this. If someone could assign this to me I'll continue working this issue. > VertexRDD's won't zip properly for `diff` capability > ---------------------------------------------------- > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX > Reporter: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org