[
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486398#comment-13486398
]
Claudio Martella commented on GIRAPH-388:
-----------------------------------------
Good work Maja. You got me thinking and I think your results make a lot of
sense. With neighborhoods of 100 vertices and 40 workers, you'd expect to have
an expected number of slightly over 2 neighbouring vertices in the same
partition (100/39). This means that, even if we didn't stream messages out with
buffering, but by kept them all in memory, we'd save a message every two. If
you consider that we buffer a bit but we flush messages as they are produced,
the number of combined messages is basically zero.
This makes a lot of sense if you consider the original idea of the combiner in
MapReduce. There, usually the cardinality of the key set of the original input
is much higher than the one of the intermediate set that you feed to the
reducer (otherwhise you wouldn't be reducing, right?). THERE, the combiner
makes a lot of sense. Yes, we still have the same advantage of using a combiner
as with PageRank on MapReduce, because there the cardinalities are the same as
well (But the number of messages is higher, in fact the complexity is O(E),
hence the combiner makes some sense). But the architecture of the shuffle and
sort makes the cost of applying the combiner cheaper (amortized) compared to
us.
I'm always more convinced that the role of the combiner is mostly to save
memory than anything else. So it should be mainly used server-side.
> Improve the way we keep outgoing messages
> -----------------------------------------
>
> Key: GIRAPH-388
> URL: https://issues.apache.org/jira/browse/GIRAPH-388
> Project: Giraph
> Issue Type: Improvement
> Reporter: Maja Kabiljo
> Assignee: Maja Kabiljo
> Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get
> to use client-side combiner are very low. I experimented with benefits which
> we can get from not having the client-side combiner at all. It turns out that
> having a lot of maps in SendMessageCache, and then collection inside each of
> them, really hurts the performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira