Avery Ching commented on GIRAPH-45:
Claudio, thanks for your response.
I agree with your points on HDFS as being expensive. I think the only
advantage is convenience of HDFS is that the reducers can easily get those
files. After thinking about it more, it probably makes more sense to simply
send the messages to the destination worker and do the storage at the
destination worker. This would allow the destination to process those messages
whenever they want and the destination worker can do the in-memory aggregation
and dump to disk when memory pressure is exceeded. Storing the messages on the
sender complicates things I believe. It is simpler for the sender to send its
messages out when it is under memory pressure.
I think it would be nice to have n files such that n == # of partitions owned
by that worker. Then when loading and computing each partition, we load the
relevant messages for that partition and populate every vertex's message list.
I am wondering why you need a BTree? We don't need to sort the messages.
I think that the memory management of the partitions can be done orthogonally.
I'll open another JIRA. No need to rush on the messaging improvement. I've
realized that by streaming the messages as Dmitriy suggested in combination
with a combiner executed on the destination worker, memory usage can be held
somewhat at bay for lots of applications. Still, storing the messages
out-of-core will be important for large graphs.
> Improve the way to keep outgoing messages
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a
> potential problem to cause out of memory when the rate of message generation
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira