[ https://issues.apache.org/jira/browse/SPARK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353617#comment-14353617 ]
Ankur Dave commented on SPARK-6022: ----------------------------------- [~maropu] is correct: the original intent of diff was to operate on values, not VertexIds. It was really written for internal use in [mapVertices|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L133] and [outerJoinVertices|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L284], which use it to find the set of vertices whose values have changed so they can ship only those to the edge partitions. Based on your test you're looking for the set difference. Maybe you could introduce a new method called "minus"? > GraphX `diff` test incorrectly operating on values (not VertexId's) > ------------------------------------------------------------------- > > Key: SPARK-6022 > URL: https://issues.apache.org/jira/browse/SPARK-6022 > Project: Spark > Issue Type: Bug > Components: GraphX > Reporter: Brennon York > > The current GraphX {{diff}} test operates on values rather than the > VertexId's and, if {{diff}} were working properly (per > [SPARK-4600|https://issues.apache.org/jira/browse/SPARK-4600]), it should > fail this test. The code to test {{diff}} should look like the below as it > correctly generates {{VertexRDD}}'s with different {{VertexId}}'s to {{diff}} > against. > {code} > test("diff functionality with small concrete values") { > withSpark { sc => > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 2L).map(id > => (id, id.toInt))) > // setA := Set((0L, 0), (1L, 1)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(1L until 3L).map(id > => (id, id.toInt+2))) > // setB := Set((1L, 3), (2L, 4)) > val diff = setA.diff(setB) > assert(diff.collect.toSet == Set((2L, 4))) > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org