[
https://issues.apache.org/jira/browse/GIRAPH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629861#comment-13629861
]
Claudio Martella commented on GIRAPH-616:
-----------------------------------------
given the nice results, out of curiosity, i ran the same benchmark with ALSO
OOC messages. basically in these tests we are currently keeping in memory 2
partitions out of the 60 assigned to each worker (3%), and each worker produces
(and receives) on average 83M messages per superstep (5B edges / 60 workers).
so I ran the tests adding giraph.maxMessagesInMemory=2490000, which is 3%. the
results follow. I would be curious to see how long it would take to run the
same number of iterations on the same graph with the same number of tasks with
MR.
13/04/12 08:45:43 INFO mapred.JobClient: Total (milliseconds)=2132200
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 3 (milliseconds)=205329
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 4 (milliseconds)=198965
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 10 (milliseconds)=109850
13/04/12 08:45:43 INFO mapred.JobClient: Setup (milliseconds)=25407
13/04/12 08:45:43 INFO mapred.JobClient: Shutdown (milliseconds)=83
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 7 (milliseconds)=200026
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 9 (milliseconds)=203015
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 0 (milliseconds)=110034
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 8 (milliseconds)=200514
13/04/12 08:45:43 INFO mapred.JobClient: Input superstep
(milliseconds)=40560
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 6 (milliseconds)=204376
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 5 (milliseconds)=199704
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 2 (milliseconds)=204082
13/04/12 08:45:43 INFO mapred.JobClient: Superstep 1 (milliseconds)=230250
The results are not bad, as each superstep last around 3 times more. but if you
think that we're keeping in memory less than 10MB of messages per worker
(considering a 4bytes float), it is quite understandable given the scale. I
think that overall, we keep less stuff in memory than the default buffers of MR
(for sorting and for io). i'd like to test some merging of the
diskbackedmessagestore files in the background, to see if reducing files and
disk seeks can make a difference (but isn't it overall as much I/O during the
merge as we would now?).
> Decouple vertices and edges in DiskBackedPartitionStore and avoid writing
> back edges when the algorithm does not change topology.
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-616
> URL: https://issues.apache.org/jira/browse/GIRAPH-616
> Project: Giraph
> Issue Type: Improvement
> Reporter: Claudio Martella
> Assignee: Claudio Martella
> Attachments: GIRAPH-616.diff, GIRAPH-616.diff
>
>
> Many algorithms work on a static graph. In these cases, when running
> out-of-core graph we end up writing back the edges that have not changed
> since we read them. By decoupling vertices and edges, we can write back only
> the freshly computed vertex values.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira