[
https://issues.apache.org/jira/browse/SPARK-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-16478.
-------------------------------
Resolution: Fixed
Fix Version/s: 2.1.0
Issue resolved by pull request 14137
[https://github.com/apache/spark/pull/14137]
> strongly connected components doesn't cache returned RDD
> --------------------------------------------------------
>
> Key: SPARK-16478
> URL: https://issues.apache.org/jira/browse/SPARK-16478
> Project: Spark
> Issue Type: Bug
> Components: GraphX
> Affects Versions: 1.6.2
> Reporter: Michał Wesołowski
> Fix For: 2.1.0
>
>
> Strongly Connected Components algorithm caches intermediary RDD's but doesn't
> cache the one that is going to be returned. With large enough graph comparing
> to available memory when one tries to take action on returned RDD whole RDD
> has to be computed from scratch which takes much more time than
> StronglyConnectedComponents alone .
> I managed to replicate the issue on databrics platform.
> [Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html]
> is notebook.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]