Ankur Dave created SPARK-1988:
---------------------------------
Summary: Enable storing edges out-of-core
Key: SPARK-1988
URL: https://issues.apache.org/jira/browse/SPARK-1988
Project: Spark
Issue Type: Improvement
Components: GraphX
Reporter: Ankur Dave
Assignee: Ankur Dave
A graph's edges are usually the largest component of the graph, and a cluster
may not have enough memory to hold them. For example, a graph with 20 billion
edges requires at least 400 GB of memory, because each edge takes 20 bytes.
GraphX only ever accesses the edges using full table scans or cluster scans
using the clustered index on source vertex ID. The edges are therefore amenable
to being stored on disk. EdgePartition should provide the option of storing
edges on disk transparently and streaming through them as needed.
--
This message was sent by Atlassian JIRA
(v6.2#6252)