[
https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-13313.
-------------------------------
Resolution: Cannot Reproduce
> Strongly connected components doesn't find all strongly connected components
> ----------------------------------------------------------------------------
>
> Key: SPARK-13313
> URL: https://issues.apache.org/jira/browse/SPARK-13313
> Project: Spark
> Issue Type: Bug
> Components: GraphX
> Affects Versions: 1.6.0
> Reporter: Petar Zecevic
>
> Strongly connected components algorithm doesn't find all strongly connected
> components. I was using Wikispeedia dataset
> (http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519
> SCCs and one of them had 4051 vertices, which in reality don't have any edges
> between them.
> I think the problem could be on line 89 of StronglyConnectedComponents.scala
> file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe
> the second Pregel call should use Out edge direction, the same as the first
> call because the direction is reversed in the provided sendMsg function
> (message is sent to source vertex and not destination vertex).
> If that is changed (line 89), the algorithm starts finding much more SCCs,
> but eventually stack overflow exception occurs. I believe graph objects that
> are changed through iterations should not be cached, but checkpointed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]