[
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486285#comment-13486285
]
Eli Reisman commented on GIRAPH-388:
------------------------------------
I wasn't really associating 314 with this patch, don't know...? The advantage
with the maps of maps approach in 328 was that we took pains not to needlessly
duplicate data in the data structures when we could reference the same message
(or whatever) multiple times. I think the ideas in 322 (way outdated now) are
more in opposition to this new approach than 314. I was comparing this patch
with the 328 patch.
As we move forward I hope we don't re-introduce a bunch of data duplication,
but instead move towards eliminating it from the data structures and the disk
spill format. If we trade speed for space too often during this process, we
will be hurting the in-memory use cases to favor the disk-spill cases.
Be careful as you do a larger redesign to try evaluating on real data rather
than the benchmarks, the behavior is so different and the benchmarks are so
forgiving. The impressions you get of performance will be night and day with
real social graph data. This will be reflected in many more facets than just
the frequency of duplicated vertex id's in the VIMC. There are parts of the
code I'm to not willing to touch until I have a good size cluster to run real
data on.
There is so much change in this area of the codebase right now (and I have been
so busy) that I have let 322 and 314 lie fallow for a while. I think I will lay
off until working on them until I see what you guys have in mind for this part
of the code. Maybe there won't be a need for either of them!
> Improve the way we keep outgoing messages
> -----------------------------------------
>
> Key: GIRAPH-388
> URL: https://issues.apache.org/jira/browse/GIRAPH-388
> Project: Giraph
> Issue Type: Improvement
> Reporter: Maja Kabiljo
> Assignee: Maja Kabiljo
> Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get
> to use client-side combiner are very low. I experimented with benefits which
> we can get from not having the client-side combiner at all. It turns out that
> having a lot of maps in SendMessageCache, and then collection inside each of
> them, really hurts the performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira