[ 
https://issues.apache.org/jira/browse/SPARK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353617#comment-14353617
 ] 

Ankur Dave commented on SPARK-6022:
-----------------------------------

[~maropu] is correct: the original intent of diff was to operate on values, not 
VertexIds. It was really written for internal use in 
[mapVertices|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L133]
 and 
[outerJoinVertices|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L284],
 which use it to find the set of vertices whose values have changed so they can 
ship only those to the edge partitions.

Based on your test you're looking for the set difference. Maybe you could 
introduce a new method called "minus"?

> GraphX `diff` test incorrectly operating on values (not VertexId's)
> -------------------------------------------------------------------
>
>                 Key: SPARK-6022
>                 URL: https://issues.apache.org/jira/browse/SPARK-6022
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Brennon York
>
> The current GraphX {{diff}} test operates on values rather than the 
> VertexId's and, if {{diff}} were working properly (per 
> [SPARK-4600|https://issues.apache.org/jira/browse/SPARK-4600]), it should 
> fail this test. The code to test {{diff}} should look like the below as it 
> correctly generates {{VertexRDD}}'s with different {{VertexId}}'s to {{diff}} 
> against.
> {code}
> test("diff functionality with small concrete values") {
>     withSpark { sc =>
>       val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 2L).map(id 
> => (id, id.toInt)))
>       // setA := Set((0L, 0), (1L, 1))
>       val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(1L until 3L).map(id 
> => (id, id.toInt+2)))
>       // setB := Set((1L, 3), (2L, 4))
>       val diff = setA.diff(setB)
>       assert(diff.collect.toSet == Set((2L, 4)))
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to