[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329252#comment-14329252 ]
Brennon York commented on SPARK-5790: ------------------------------------- [~maropu] this looks very similar to the work I just pushed up for [SPARK-1955|https://github.com/apache/spark/pull/4705] which was acting as the overarching issue for this ticket. I didn't write tests though which would be a major benefit. Would you be willing to refactor and only include the tests to close this issue out? That would help out tremendously and I wouldn't want to lose that effort! > VertexRDD's won't zip properly for `diff` capability > ---------------------------------------------------- > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX > Reporter: Brennon York > Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org