Avery Ching commented on GIRAPH-45:

Claudio, thanks for your response.

I agree with your points on HDFS as being expensive.  I think the only 
advantage is convenience of HDFS is that the reducers can easily get those 
files.  After thinking about it more, it probably makes more sense to simply 
send the messages to the destination worker and do the storage at the 
destination worker.  This would allow the destination to process those messages 
whenever they want and the destination worker can do the in-memory aggregation 
and dump to disk when memory pressure is exceeded.  Storing the messages on the 
sender complicates things I believe.  It is simpler for the sender to send its 
messages out when it is under memory pressure.

I think it would be nice to have n files such that n == # of partitions owned 
by that worker.  Then when loading and computing each partition, we load the 
relevant messages for that partition and populate every vertex's message list.  

I am wondering why you need a BTree?  We don't need to sort the messages.

I think that the memory management of the partitions can be done orthogonally.  
I'll open another JIRA.  No need to rush on the messaging improvement.  I've 
realized that by streaming the messages as Dmitriy suggested in combination 
with a combiner executed on the destination worker, memory usage can be held 
somewhat at bay for lots of applications.  Still, storing the messages 
out-of-core will be important for large graphs.
> Improve the way to keep outgoing messages
> -----------------------------------------
>                 Key: GIRAPH-45
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-45
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
> potential problem to cause out of memory when the rate of message generation 
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or 
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to