[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151823#comment-13151823 ]
Arun Suresh commented on GIRAPH-91: ----------------------------------- Avery, I see that you have used 2 sorted ArrayLists. Couldnt a LinkedHashMap have been an alternative ? I understand that the getEdgeValue and hasEdgeVale would be faster if it were a sortedArrayList. Also arraylists are more compact. But I was just wondering.. in the event that the graph is truly large (millions of edges, for a vertex) would it make sense to have the entire edgelist in memory in the first place ? we might need a scheme where only a part of the list is in memory and have chunks of the list fetched on demand when the provided iterator calls next(). In which case we can have a hybrid array + linked list (linked list of chunks of the edgelist) > Large-memory improvements (Memory reduced vertex implementation, fast > failure, added settings) > ----------------------------------------------------------------------------------------------- > > Key: GIRAPH-91 > URL: https://issues.apache.org/jira/browse/GIRAPH-91 > Project: Giraph > Issue Type: Improvement > Reporter: Avery Ching > Assignee: Avery Ching > Attachments: GIRAPH-91.diff > > > Current vertex implementation uses a HashMap for storing the edges, which is > quite memory heavy for large graphs. The default settings in Giraph need to > be improved for large graphs and heaps of >20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira