[ 
https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582292#comment-13582292
 ] 

Thomas Jungblut edited comment on HAMA-704 at 2/20/13 4:33 PM:
---------------------------------------------------------------

VertexID in the vertex must be comparable. That is actually enough for 
everything.

I was just profiling the memory leak by using 1 mio. pagerank vertices and 10 
edges each (50mb). 
Here is much more detailed memory analysis:

_After Reading Vertices to RAM in setup (Superstep2)_

600mb raw heap usage.
418199256 bytes occupied by the vertices.
287999928 bytes occupied by Text objects (used as Vertex Key 48000000 bytes, 
rest is edge bytes)
237999192 bytes occupied by Edges (Text Objects and Null references)
 
_In the first superstep_

1,5gb heap
Vertex memory keeps constant. Messages are as follows:
5 mio. GraphJobMessages (only half of the out edges) 225mb. So with all 
messages, this sums up to a bit less than 500 mb (10 times the graph size!).
Each vertex message contains ~40 bytes, 20 Text, 20 DoubleWritable.

_In the fourth superstep (of 6 in total)_

GC'd to 1,1GB again
BSPMessageBundle contains 4,1 mio messages and is only one time in memory. 
However the linked list in that hashmap of the bundle contains 100 MB of data.
Maybe we can switch to an arraylist again, they are much sparser in memory 
because they aren't doubly linked and we should release the reference of it 
once it is send via RPC.


However, everything is collected properly, so there is no memory leak in my 
opinion.

BTW: is it intended in the VerticesInfo to do a linear search for every vertex? 
That is slow like hell. 
                
      was (Author: thomas.jungblut):
    VertexID in the vertex must be comparable. That is actually enough for 
everything.

I was just profiling the memory leak by using 1 mio. pagerank vertices and 10 
edges each (50mb). 
Here is much more detailed memory analysis:

_After Reading Vertices to RAM in setup (Superstep2)_

600mb raw heap usage.
418199256 bytes occupied by the vertices.
287999928 bytes occupied by Text objects (used as Vertex Key 48000000 bytes, 
rest is edge bytes)
237999192 bytes occupied by Edges (Text Objects and Null references)
 
_In the first superstep_

1,5gb heap
Vertex memory keeps constant. Messages are as follows:
5 mio. GraphJobMessages (only half of the out edges) 225mb. So with all 
messages, this sums up to a bit less than 500 mb (10 times the graph size!).
Each vertex message contains ~40 bytes, 20 Text, 20 DoubleWritable.

_In the fourth superstep (of 6 in total)_

GC'd to 1,1GB again
BSPMessageBundle contains 4,1 mio messages and is only one time in memory. 
However the linked list in that hashmap of the bundle contains 1 MB of data.
Maybe we can switch to an arraylist again, they are much sparser in memory 
because they aren't doubly linked and we should release the reference of it 
once it is send via RPC.


However, everything is collected properly, so there is no memory leak in my 
opinion.

BTW: is it intended in the VerticesInfo to do a linear search for every vertex? 
That is slow like hell. 
                  
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
>                 Key: HAMA-704
>                 URL: https://issues.apache.org/jira/browse/HAMA-704
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>             Fix For: 0.6.1
>
>         Attachments: HAMA-704.patch-v1, hama-704_v05.patch, 
> HAMA-704-v2.patch, localdisk.patch, mytest.patch, patch.txt, patch.txt, 
> removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an 
> efficient way to reduce memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to