Claudio Martella commented on GIRAPH-45:
in the current naive implementation key=vertex_id value=message, i keep an in
memory SortedMap<I, Queue<M>> (concurrentskiplistmap). when the map is under
memory pressure i flush it to disk to a new file, sorted with its own BTree
index and its own BloomFilter. This means that i'm going to have possibly
multiple SequenceFiles at the end of the messages collection from other peers
(the beginning of each superstep).
to read the messages for a vertex at compute() time i ask all these files to
provide me their partial set of messages for that vertex. this means max N
seeks to the block holding them (where N is the number of files and assuming
all N files have data about the given vertex, bloomfilter (and partially the
index as well) is used exactly to avoid N seeks when not necessary). writing is
append-only at flush.
in the optimized implementation key=vertex_id and value=messages, and that's
going to be a bit more serialize-deserialize efficient.
so, I'm never going to spill just a few tuples at a time. it really is a
simplified version of bigtable/hbase, where i take advantage of our particular
demands/contraints the simplify my life quite a lot (as i said, no random
reads, no update/deletes, single reader)
> Improve the way to keep outgoing messages
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a
> potential problem to cause out of memory when the rate of message generation
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira