[ https://issues.apache.org/jira/browse/SPARK-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099476#comment-14099476 ]
Larry Xiao commented on SPARK-1987: ----------------------------------- ok. I understand. I'll try to implement it > More memory-efficient graph construction > ---------------------------------------- > > Key: SPARK-1987 > URL: https://issues.apache.org/jira/browse/SPARK-1987 > Project: Spark > Issue Type: Improvement > Components: GraphX > Reporter: Ankur Dave > Assignee: Ankur Dave > > A graph's edges are usually the largest component of the graph. GraphX > currently stores edges in parallel primitive arrays, so each edge should only > take 20 bytes to store (srcId: Long, dstId: Long, attr: Int). However, the > current implementation in EdgePartitionBuilder uses an array of Edge objects > as an intermediate representation for sorting, so each edge additionally > takes about 40 bytes during graph construction (srcId (8) + dstId (8) + attr > (4) + uncompressed pointer (8) + object overhead (8) + padding (4)). This > unnecessarily increases GraphX's memory requirements by a factor of 3. > To save memory, EdgePartitionBuilder should instead use a custom sort routine > that operates directly on the three parallel arrays. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org