zhangzhenyue created SPARK-6378:
-----------------------------------

             Summary: srcAttr in graph.triplets don't update when the size of 
graph is huge
                 Key: SPARK-6378
                 URL: https://issues.apache.org/jira/browse/SPARK-6378
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.2.1
            Reporter: zhangzhenyue


when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
srcAttr and dstAttr in graph.triplets don't update when using the 
Graph.outerJoinVertices(when the data in vertex is changed).

the code and the log is as follows:
{quote}
g = graph.outerJoinVertices()...
g,vertices,count()
g.edges.count()
println("example edge " + g.triplets.filter(e => e.srcId == 
5000000001L).collect()
      .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + 
e.dstAttr)).mkString("\n"))
    println("example vertex " + g.vertices.filter(e => e._1 == 
5000000001L).collect()
      .map(e => (e._1 + "," + e._2)).mkString("\n"))
{quote}

the result:
{quote}
example edge 5000000001:0, 2467451620:61
5000000001:0, 1962741310:83 // attr of vertex 5000000001 is 0 in Graph.triplets
example vertex 5000000001,2 // attr of vertex 5000000001 is 2 in Graph.vertices
{quote}

when the graph is smaller(10 million vertex), the code is OK, the triplets will 
update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to