This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned.
It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to it, e.g. graph.triplets.map(_.copy()).collect() On Mon, May 19, 2014 at 1:09 PM, GlennStrycker <glenn.stryc...@gmail.com>wrote: > graph.triplets does not work -- it returns incorrect results > > I have a graph with the following edges: > > orig_graph.edges.collect > = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), > Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), > Edge(5,2,1), Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1), > Edge(7,3,1)) > > When I run triplets.collect, I only get the last edge repeated 16 times: > > orig_graph.triplets.collect > = Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1)) > > I've also tried writing various map steps first before calling the triplet > function, but I get the same results as above. > > Similarly, the example on the graphx programming guide page > (http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html) is > incorrect. > > val facts: RDD[String] = > graph.triplets.map(triplet => > triplet.srcAttr._1 + " is the " + triplet.attr + " of " + > triplet.dstAttr._1) > > does not work, but > > val facts: RDD[String] = > graph.triplets.map(triplet => > triplet.srcAttr + " is the " + triplet.attr + " of " + triplet.dstAttr) > > does work, although the results are meaningless. For my graph example, I > get the following line repeated 16 times: > > 1 is the 1 of 1 > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. >