[
https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584246#comment-13584246
]
Thomas Jungblut commented on HAMA-704:
--------------------------------------
bq.Create an array of say 8 or 16 spilling queues, basically partitioning the
messages into vertice corresponding buckets. So spilling queue[0] would be
responsible for messages for first numVertices/16 vertices, queues[1] for the
next numVertices/16.
That is cool, you just have to make sure that they are sorted and
getCurrentMessage returns them in this order. Nothing much else is needed ;-)
Your approach has a bit of a problem, as the currentMessage must be the lowest
in all the received messages. So a heap structure wouldn't be wrong in this
case although very difficult to maintain the heap property on disk. Your
bucketing must ensure that the order is correct, otherwise sorting the buckets
in itself won't yield a correct working algorithm.
bq.For sorted spilling queue I would have to sort throughout the whole messages.
I guess we can't get over that easily. But I told you in chat already that
sorting on sender side and merging on receiver side is in this case better and
easier to implement.
bq.I do understand the messages don't have uniform distribution across vertices
and we don't need a complete B-tree implementation.
The good thing is that a b-tree is maintaining order over all keys, so you can
iterate messages with minimal number of disk seeks.
The message distribution is related to the number of active vertices- in case
of pagerank every vertex is active for all iterations (that is why I benchmark
it usually).
We track how many vertices are active, so we can give a hint to the queue how
much it should cache.
I guess we shouldn't get that fancy for the first shot, improvement can always
be done later on.
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
> Key: HAMA-704
> URL: https://issues.apache.org/jira/browse/HAMA-704
> Project: Hama
> Issue Type: Improvement
> Components: graph
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Priority: Critical
> Fix For: 0.6.1
>
> Attachments: HAMA-704_1337.patch, HAMA-704_1338.patch,
> HAMA-704.patch-v1, hama-704_v05.patch, HAMA-704-v2.patch, localdisk.patch,
> mytest.patch, patch.txt, patch.txt, removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an
> efficient way to reduce memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira