Avery Ching commented on GIRAPH-45:
I think that a read messages-by-vertex at a time from disk will reduce memory
pressure more than the partition-based storage. I'm assuming that
key=vertex_id and value=message_list in your explanation. How do you keep the
keys together in the file? For instance, suppose that you get the following
tuples <vertex_id, message_list>
<0, 2.0, 3.0>
In a bad scenario, you have to spill to disk after each tuple. The files
totally are out of order and your index <vertex, bytes offset> looks something
But if I'm understanding this scheme, wouldn't each vertex need to scan the
entire file if the vertices keep coming and are totally random?
I suppose that another way to do this is to use the partition-based method and
add a small change. If the partition is deemed to large to load in memory and
sort, it could be read and re-dumped into n files, where n is chosen such that
there is a good chance that it produces small enough files so that every one of
them can fit in memory at a time. This can be done recursively.
> Improve the way to keep outgoing messages
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a
> potential problem to cause out of memory when the rate of message generation
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira