[ https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146534#comment-15146534 ]
Sean Owen commented on SPARK-13313: ----------------------------------- Can you be more specific? like specific examples from the data and a pull request? > Strongly connected components doesn't find all strongly connected components > ---------------------------------------------------------------------------- > > Key: SPARK-13313 > URL: https://issues.apache.org/jira/browse/SPARK-13313 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 1.6.0 > Reporter: Petar Zecevic > > Strongly connected components algorithm doesn't find all strongly connected > components. I was using Wikispeedia dataset > (http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519 > SCCs and one of them had 4051 vertices, which in reality don't have any edges > between them. > I think the problem could be on line 89 of StronglyConnectedComponents.scala > file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe > the second Pregel call should use Out edge direction, the same as the first > call because the direction is reversed in the provided sendMsg function > (message is sent to source vertex and not destination vertex). > If that is changed (line 89), the algorithm starts finding much more SCCs, > but eventually stack overflow exception occurs. I believe graph objects that > are changed through iterations should not be cached, but checkpointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org