[ 
https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584246#comment-13584246
 ] 

Thomas Jungblut commented on HAMA-704:
--------------------------------------

bq.Create an array of say 8 or 16 spilling queues, basically partitioning the 
messages into vertice corresponding buckets. So spilling queue[0] would be 
responsible for messages for first numVertices/16 vertices, queues[1] for the 
next numVertices/16. 

That is cool, you just have to make sure that they are sorted and 
getCurrentMessage returns them in this order. Nothing much else is needed ;-)
Your approach has a bit of a problem, as the currentMessage must be the lowest 
in all the received messages. So a heap structure wouldn't be wrong in this 
case although very difficult to maintain the heap property on disk. Your 
bucketing must ensure that the order is correct, otherwise sorting the buckets 
in itself won't yield a correct working algorithm.

bq.For sorted spilling queue I would have to sort throughout the whole messages.

I guess we can't get over that easily. But I told you in chat already that 
sorting on sender side and merging on receiver side is in this case better and 
easier to implement.

bq.I do understand the messages don't have uniform distribution across vertices 
and we don't need a complete B-tree implementation.

The good thing is that a b-tree is maintaining order over all keys, so you can 
iterate messages with minimal number of disk seeks.

The message distribution is related to the number of active vertices- in case 
of pagerank every vertex is active for all iterations (that is why I benchmark 
it usually).
We track how many vertices are active, so we can give a hint to the queue how 
much it should cache. 

I guess we shouldn't get that fancy for the first shot, improvement can always 
be done later on.
                
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
>                 Key: HAMA-704
>                 URL: https://issues.apache.org/jira/browse/HAMA-704
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>             Fix For: 0.6.1
>
>         Attachments: HAMA-704_1337.patch, HAMA-704_1338.patch, 
> HAMA-704.patch-v1, hama-704_v05.patch, HAMA-704-v2.patch, localdisk.patch, 
> mytest.patch, patch.txt, patch.txt, removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an 
> efficient way to reduce memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to