[ 
https://issues.apache.org/jira/browse/SPARK-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370409#comment-15370409
 ] 

Sean Owen commented on SPARK-16478:
-----------------------------------

You can cache the results if you want it cached. The RDD is computed whether or 
not it's cached. Are you further suggesting it needs to be materialized before 
being returned? I wouldn't necessarily think so in general, but, could imagine 
so in specific cases. More info please on what you expect vs observe.

> strongly connected components doesn't cache returned RDD
> --------------------------------------------------------
>
>                 Key: SPARK-16478
>                 URL: https://issues.apache.org/jira/browse/SPARK-16478
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.6.2
>            Reporter: Michał Wesołowski
>
> Strongly Connected Components algorithm caches intermediary RDD's but doesn't 
> cache the one that is going to be returned. With large enough graph comparing 
> to available memory when one tries to take action on returned RDD whole RDD 
> has to be computed from scratch which takes much more time than 
> StronglyConnectedComponents alone . 
> I managed to replicate the issue on databrics platform. 
> [Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html]
>  is notebook. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to