[
https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-322:
-------------------------------
Attachment: GIRAPH-322-4.patch
This is some tweaks and improvements. I tried several ways to remove the
"duplication-per-partition" on the sender side, and learned this:
1) it can totally be done, and would deduplicate a lot of messages for all code
paths from Vertex#sendMessage etc.
2) it touches more code than I feel comfortable including in this JIRA when it
should really be a separate JIRA and we should do sendMessage() and
sendMessageToAllEdges() at the same time.
3) I can test GIRAPH-322 just fine using "-Dhash.userPartitionCount==# of
workers" to see what comes of this, and get this commited as its own fix,
rolling the partition deduplicating in the code to the other JIRA mentioned in
#2. This idea can then be judged on its own merits (or not)
4) For future reference, the JIRA mentioned in #2 would require
WorkerInfo/PartitionOwner type plumbing to be per-worker instances and not
per-partition anymore, and would require the netty request ack's like
ClientRequestId to use the host-port combo for that worker as a
"destinationWorkerId" rather than the WorkerInfo's partitionId. thats about it.
This would be a good JIRA, a real win I think.
So, here's a version that should bear some testing. I'm still on a laptop but
when i get to set my Giraph rig up again at home I will definitely begin doing
this. More soon...
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of
> control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-322
> URL: https://issues.apache.org/jira/browse/GIRAPH-322
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch,
> GIRAPH-322-3.patch, GIRAPH-322-4.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the
> data structures and code paths used to transport messages through a Giraph
> application and out on the network. While messages to a single vertex can be
> combined (and should be) in some applications that could make use of this
> broadcast messaging, the out of control message growth of algorithms like
> triangle closing means we need to de-duplicate messages bound for many
> vertices/partitions.
> This will be an evolving solution (this first patch is just the first step)
> and currently it does not present a robust solution for disk-spill message
> stores. I figure I can get some advice about that or it can be a follow-up
> JIRA if this turns out to be a fruitful pursuit. This first patch is also
> Netty-only and simply defaults to the old sendMessagesToAllEdges()
> implementation if USE_NETTY is false. All this can be cleaned up when we know
> this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length
> encoding their delivery and only duplicating message on the network when they
> are bound for different partitions. This is also best when combined with
> "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is,
> it represents an end-to-end solution, using Netty, for in-memory messaging.
> It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I
> test it and ideas/suggestions crop up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira