[
https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582976#comment-13582976
]
Thomas Jungblut commented on HAMA-704:
--------------------------------------
So I want to share the overall plan now.
Our requirements are:
- Vertex and Edge Values (+possible user added state) change through every
superstep
- Vertex can be active or not, but if it is inactive it is just activated by a
message until it is voteToHalt()
- Be as memory savvy as possible, for future FT needs it is necessary to spill
changing parts to disk to be restartable.
So how can we tackle this with an architectecture?
*The graph part*
We divide the graph into two parts:
1. a static graph that only contains the ID's they are sorted ascending by the
vertex id, followed by it's outlink Ids.
(VertexID1,EdgeID1,EdgeID2,VertexID2,EdgeID5 etc...).
2. a changing graph (soft/liquid graph) that contains the soft attributes that
change very often. It's format would be in a parallel fashion to the static
graph so both can be read in paralle.
(VertexValue1,EdgeValue1,EdgeValue2,VertexValue2,EdgeValue5 etc...)
Both files are on disk, the static graph must be written while receiving the
vertices from the partitioning + beeing sorted. The soft part is written
everytime to disk (or directmemory or whatever sink defined by the
VerticesInfo) during a superstep after the vertex computed its new state.
Therefore we need some additional bookkeeping. For the activeness of vertex we
can use a BitSet that has an index mapping (we can index-map everything because
it is sorted), e.G. bit 1 is the vertex1 beeing active when set. To seek to the
next active vertex without deserializing the whole graph, we need another index
mapping by a long[] that contains the byte-offset where the values of a given
vertex at the given index starts. This is majorly for seeking purposes, so we
can seek to the next active vertex in the set without deserializing the between
part of the graph. Anyways, this has really really low overhead in memory
(pretty much none) and it is scalable like hell so we can make GraphChi look
like a real loser.
Suraj and me talked about it, and he is going to make the messaging more
scalable- I focus on the graph storages according to the plan above. Our
interface between the two is a sorted messaging (ascending by vertex ID) queue.
As long as Suraj is working on it, I will use the SortedMessagingQueue in
memory to simulate the behaviour.
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
> Key: HAMA-704
> URL: https://issues.apache.org/jira/browse/HAMA-704
> Project: Hama
> Issue Type: Improvement
> Components: graph
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Priority: Critical
> Fix For: 0.6.1
>
> Attachments: HAMA-704.patch-v1, hama-704_v05.patch,
> HAMA-704-v2.patch, localdisk.patch, mytest.patch, patch.txt, patch.txt,
> removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an
> efficient way to reduce memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira