[jira] [Commented] (GIRAPH-322) Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message growth in large scale jobs

Maja Kabiljo (JIRA) Tue, 11 Sep 2012 07:59:09 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453083#comment-13453083
 ]


Maja Kabiljo commented on GIRAPH-322:
-------------------------------------

You are missing the default constructor for SendBroadcastMessageRequest. If you 
are testing it with smaller example it can be the cause of the problem you 
described, since flush will be the first moment when messages get sent and 
processed. Though you should be able to see the exception if this is the only 
problem. 

Great work, I'm looking forward to hear how it performs.

This is not a big deal, but when sendBroadcastMessageRequest is called you have 
all destinations grouped together, and then you add it one by one to the cache 
where they are grouped together again. 

Revisiting out-of-core to work better with this solution is not going to be 
straightforward, since we still want to keep the messages for one vertex 
grouped together in order to minimize random accesses. I'll think a bit about 
it and share anything I came up with.

One thing that doesn't seem good is that you are setting the number of 
partitions to 1 per worker. I think Avery already suggested something about 
modifying SendMessageCache to keep and send messages per worker, not per 
partition. So something like that could also be done with this solution.

Was your plan to use out-of-core graph in the end also, or that data would fit 
in memory?

I'm really interested in hearing whether you make it work with out-of-core 
messages. I agree with what you said for benchmarks vs real use cases, but for 
out-of-core messages I don't see how it can get worse than 
RandomMessageBenchmark since one can't generate messages at higher rate than 
there :-) Maybe memory is leaking elsewhere, one thing you can try is set 
number of in-core messages and open requests to something really low, it will 
be extremely slow but just to see whether it will crash.
                
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of 
> control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-322
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-322
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the 
> data structures and code paths used to transport messages through a Giraph 
> application and out on the network. While messages to a single vertex can be 
> combined (and should be) in some applications that could make use of this 
> broadcast messaging, the out of control message growth of algorithms like 
> triangle closing means we need to de-duplicate messages bound for many 
> vertices/partitions.
> This will be an evolving solution (this first patch is just the first step) 
> and currently it does not present a robust solution for disk-spill message 
> stores. I figure I can get some advice about that or it can be a follow-up 
> JIRA if this turns out to be a fruitful pursuit. This first patch is also 
> Netty-only and simply defaults to the old sendMessagesToAllEdges() 
> implementation if USE_NETTY is false. All this can be cleaned up when we know 
> this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length 
> encoding their delivery and only duplicating message on the network when they 
> are bound for different partitions. This is also best when combined with 
> "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is, 
> it represents an end-to-end solution, using Netty, for in-memory messaging. 
> It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I 
> test it and ideas/suggestions crop up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-322) Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message growth in large scale jobs

Reply via email to