[ 
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maja Kabiljo updated GIRAPH-388:
--------------------------------

    Attachment: GIRAPH-388.patch

By replacing per partition storage with two simple lists here are the 
improvements I got:

PageRankBenchmark, 30m vertices, 100 edges, 40 workers

Single computation thread
Average computation time from 80s -> 48s
Total superstep time 85s -> 56s
 
10 computation threads 
Average computation time 14s -> 8s
Total superstep time 21s -> 17s

I can fix the part from DiskBackedMessageStoreByPartition not to have to copy 
values to the map, by adding more methods to message store interfaces. But I 
think those will need to be revised soon anyway because of this and upcoming 
changes.

This removes client-side combiner completely, does anyone have an application 
which will suffer because of it? If needed, we can have two implementations of 
something like VertexIdMessageCollection, one of which will still allow 
combiner to be used.
                
> Improve the way we keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-388
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-388
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>         Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get 
> to use client-side combiner are very low. I experimented with benefits which 
> we can get from not having the client-side combiner at all. It turns out that 
> having a lot of maps in SendMessageCache, and then collection inside each of 
> them, really hurts the performance. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to