Ankur Dave created SPARK-1988:
---------------------------------

             Summary: Enable storing edges out-of-core
                 Key: SPARK-1988
                 URL: https://issues.apache.org/jira/browse/SPARK-1988
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
            Reporter: Ankur Dave
            Assignee: Ankur Dave


A graph's edges are usually the largest component of the graph, and a cluster 
may not have enough memory to hold them. For example, a graph with 20 billion 
edges requires at least 400 GB of memory, because each edge takes 20 bytes.

GraphX only ever accesses the edges using full table scans or cluster scans 
using the clustered index on source vertex ID. The edges are therefore amenable 
to being stored on disk. EdgePartition should provide the option of storing 
edges on disk transparently and streaming through them as needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to