Petar Zecevic created SPARK-13313: ------------------------------------- Summary: Strongly connected components doesn't find all strongly connected components Key: SPARK-13313 URL: https://issues.apache.org/jira/browse/SPARK-13313 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.6.0 Reporter: Petar Zecevic
Strongly connected components algorithm doesn't find all strongly connected components. I was using Wikispeedia dataset (http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519 SCCs and one of them had 4051 vertices, which in reality don't have any edges between them. I think the problem could be on line 89 of StronglyConnectedComponents.scala file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe the second Pregel call should use Out edge direction, the same as the first call because the direction is reversed in the provided sendMsg function (message is sent to source vertex and not destination vertex). If that is changed (line 89), the algorithm starts finding much more SCCs, but eventually stack overflow exception occurs. I believe graph objects that are changed through iterations should not be cached, but checkpointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org