[
https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455061#comment-13455061
]
Eli Reisman commented on GIRAPH-322:
------------------------------------
The WorkerInfo is in reference to the fact that many are referenced needlessly
inside PartitionOwners in the messaging code to match everyone up with a
Partition ID. One of the ways that worked well in my experiments with this
yesterday was to organize the messaging data structures around WorkerInfos
instead of PartitionOwners and use those to group related messages that could
be sent in a bundle together. The ClientRequestId would need to use a host/port
hash from WorkerInfo or something in place of a partitionId for sequencing but
thats a small change. This wouldn't be an amazing fix, but it would give us a
finer-grained control over how we bundle groups of messages between nodes and
avoid extra data structures to route the messages to partitions from the
sending side. Anyway I'm thinking I shoudl put up a JIRA for this and try it as
a separate issue.
When the code died using my attempt at spill to disk, it died right at the
beginning of the run as soon as the graph data was loaded and the messaging
started, same as without it. My explanation from the previous comment is how
the amortizing code died, sorry. Anyway from your explanation it sounds like I
did not have it set right if it was doing that. I am excited to try it again.
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of
> control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-322
> URL: https://issues.apache.org/jira/browse/GIRAPH-322
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch,
> GIRAPH-322-3.patch, GIRAPH-322-4.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the
> data structures and code paths used to transport messages through a Giraph
> application and out on the network. While messages to a single vertex can be
> combined (and should be) in some applications that could make use of this
> broadcast messaging, the out of control message growth of algorithms like
> triangle closing means we need to de-duplicate messages bound for many
> vertices/partitions.
> This will be an evolving solution (this first patch is just the first step)
> and currently it does not present a robust solution for disk-spill message
> stores. I figure I can get some advice about that or it can be a follow-up
> JIRA if this turns out to be a fruitful pursuit. This first patch is also
> Netty-only and simply defaults to the old sendMessagesToAllEdges()
> implementation if USE_NETTY is false. All this can be cleaned up when we know
> this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length
> encoding their delivery and only duplicating message on the network when they
> are bound for different partitions. This is also best when combined with
> "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is,
> it represents an end-to-end solution, using Netty, for in-memory messaging.
> It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I
> test it and ideas/suggestions crop up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira