[
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maja Kabiljo updated GIRAPH-388:
--------------------------------
Attachment: GIRAPH-388.patch
By replacing per partition storage with two simple lists here are the
improvements I got:
PageRankBenchmark, 30m vertices, 100 edges, 40 workers
Single computation thread
Average computation time from 80s -> 48s
Total superstep time 85s -> 56s
10 computation threads
Average computation time 14s -> 8s
Total superstep time 21s -> 17s
I can fix the part from DiskBackedMessageStoreByPartition not to have to copy
values to the map, by adding more methods to message store interfaces. But I
think those will need to be revised soon anyway because of this and upcoming
changes.
This removes client-side combiner completely, does anyone have an application
which will suffer because of it? If needed, we can have two implementations of
something like VertexIdMessageCollection, one of which will still allow
combiner to be used.
> Improve the way we keep outgoing messages
> -----------------------------------------
>
> Key: GIRAPH-388
> URL: https://issues.apache.org/jira/browse/GIRAPH-388
> Project: Giraph
> Issue Type: Improvement
> Reporter: Maja Kabiljo
> Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get
> to use client-side combiner are very low. I experimented with benefits which
> we can get from not having the client-side combiner at all. It turns out that
> having a lot of maps in SendMessageCache, and then collection inside each of
> them, really hurts the performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira