Petar Zecevic created SPARK-13313:
-------------------------------------

             Summary: Strongly connected components doesn't find all strongly 
connected components
                 Key: SPARK-13313
                 URL: https://issues.apache.org/jira/browse/SPARK-13313
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.6.0
            Reporter: Petar Zecevic


Strongly connected components algorithm doesn't find all strongly connected 
components. I was using Wikispeedia dataset 
(http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519 
SCCs and one of them had 4051 vertices, which in reality don't have any edges 
between them. 
I think the problem could be on line 89 of StronglyConnectedComponents.scala 
file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe 
the second Pregel call should use Out edge direction, the same as the first 
call because the direction is reversed in the provided sendMsg function 
(message is sent to source vertex and not destination vertex).
If that is changed (line 89), the algorithm starts finding much more SCCs, but 
eventually stack overflow exception occurs. I believe graph objects that are 
changed through iterations should not be cached, but checkpointed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to