Glenn Strycker created SPARK-1883:
-------------------------------------
Summary: spark graphx triplets.map does not return correct values
Key: SPARK-1883
URL: https://issues.apache.org/jira/browse/SPARK-1883
Project: Spark
Issue Type: Bug
Reporter: Glenn Strycker
graph.triplets does not work -- it returns incorrect results
I have a graph with the following edges:
orig_graph.edges.collect
= Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1),
Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1),
Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1), Edge(7,3,1))
When I run triplets.collect, I only get the last edge repeated 16 times:
orig_graph.triplets.collect
= Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1))
I've also tried writing various map steps first before calling the triplet
function, but I get the same results as above.
Similarly, the example on the graphx programming guide page
(http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html) is
incorrect.
val facts: RDD[String] =
graph.triplets.map(triplet =>
triplet.srcAttr._1 + " is the " + triplet.attr + " of " +
triplet.dstAttr._1)
does not work, but
val facts: RDD[String] =
graph.triplets.map(triplet =>
triplet.srcAttr + " is the " + triplet.attr + " of " + triplet.dstAttr)
does work, although the results are meaningless. For my graph example, I get
the following line repeated 16 times:
1 is the 1 of 1
--
This message was sent by Atlassian JIRA
(v6.2#6252)