[
https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankur Dave resolved SPARK-1988.
-------------------------------
Resolution: Fixed
This is mitigated by SPARK-1991, because the user can increase the number of
edge partitions so that each edge partition individually fits in memory, then
set the storage level of the edges to MEMORY_AND_DISK.
> Enable storing edges out-of-core
> --------------------------------
>
> Key: SPARK-1988
> URL: https://issues.apache.org/jira/browse/SPARK-1988
> Project: Spark
> Issue Type: Improvement
> Components: GraphX
> Reporter: Ankur Dave
> Assignee: Ankur Dave
> Priority: Minor
>
> A graph's edges are usually the largest component of the graph, and a cluster
> may not have enough memory to hold them. For example, a graph with 20 billion
> edges requires at least 400 GB of memory, because each edge takes 20 bytes.
> GraphX only ever accesses the edges using full table scans or cluster scans
> using the clustered index on source vertex ID. The edges are therefore
> amenable to being stored on disk. EdgePartition should provide the option of
> storing edges on disk transparently and streaming through them as needed.
--
This message was sent by Atlassian JIRA
(v6.2#6252)