Ankur Dave created SPARK-1987:
---------------------------------

             Summary: More memory-efficient graph construction
                 Key: SPARK-1987
                 URL: https://issues.apache.org/jira/browse/SPARK-1987
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
            Reporter: Ankur Dave
            Assignee: Ankur Dave


A graph's edges are usually the largest component of the graph. GraphX 
currently stores edges in parallel primitive arrays, so each edge should only 
take 20 bytes to store (srcId: Long, dstId: Long, attr: Int). However, the 
current implementation in EdgePartitionBuilder uses an array of Edge objects as 
an intermediate representation for sorting, so each edge additionally takes 
about 40 bytes during graph construction (srcId (8) + dstId (8) + attr (4) + 
uncompressed pointer (8) + object overhead (8) + padding (4)). This 
unnecessarily increases GraphX's memory requirements by a factor of 3.

To save memory, EdgePartitionBuilder should instead use a custom sort routine 
that operates directly on the three parallel arrays.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to