[jira] [Commented] (HAMA-704) Optimization of memory usage during message processing

Thomas Jungblut (JIRA) Wed, 20 Feb 2013 23:10:16 -0800

    [ 
https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582976#comment-13582976
 ]


Thomas Jungblut commented on HAMA-704:
--------------------------------------

So I want to share the overall plan now.

Our requirements are:
 - Vertex and Edge Values (+possible user added state) change through every 
superstep
 - Vertex can be active or not, but if it is inactive it is just activated by a 
message until it is voteToHalt()
 - Be as memory savvy as possible, for future FT needs it is necessary to spill 
changing parts to disk to be restartable.

So how can we tackle this with an architectecture? 

*The graph part*

We divide the graph into two parts:
1. a static graph that only contains the ID's they are sorted ascending by the 
vertex id, followed by it's outlink Ids. 
(VertexID1,EdgeID1,EdgeID2,VertexID2,EdgeID5 etc...).
2. a changing graph (soft/liquid graph) that contains the soft attributes that 
change very often. It's format would be in a parallel fashion to the static 
graph so both can be read in paralle. 
(VertexValue1,EdgeValue1,EdgeValue2,VertexValue2,EdgeValue5 etc...)

Both files are on disk, the static graph must be written while receiving the 
vertices from the partitioning + beeing sorted. The soft part is written 
everytime to disk (or directmemory or whatever sink defined by the 
VerticesInfo) during a superstep after the vertex computed its new state.

Therefore we need some additional bookkeeping. For the activeness of vertex we 
can use a BitSet that has an index mapping (we can index-map everything because 
it is sorted), e.G. bit 1 is the vertex1 beeing active when set. To seek to the 
next active vertex without deserializing the whole graph, we need another index 
mapping by a long[] that contains the byte-offset where the values of a given 
vertex at the given index starts. This is majorly for seeking purposes, so we 
can seek to the next active vertex in the set without deserializing the between 
part of the graph. Anyways, this has really really low overhead in memory 
(pretty much none) and it is scalable like hell so we can make GraphChi look 
like a real loser.

Suraj and me talked about it, and he is going to make the messaging more 
scalable- I focus on the graph storages according to the plan above. Our 
interface between the two is a sorted messaging (ascending by vertex ID) queue. 
As long as Suraj is working on it, I will use the SortedMessagingQueue in 
memory to simulate the behaviour.

                
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
>                 Key: HAMA-704
>                 URL: https://issues.apache.org/jira/browse/HAMA-704
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>             Fix For: 0.6.1
>
>         Attachments: HAMA-704.patch-v1, hama-704_v05.patch, 
> HAMA-704-v2.patch, localdisk.patch, mytest.patch, patch.txt, patch.txt, 
> removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an 
> efficient way to reduce memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-704) Optimization of memory usage during message processing

Reply via email to