[jira] [Commented] (GIRAPH-616) Decouple vertices and edges in DiskBackedPartitionStore and avoid writing back edges when the algorithm does not change topology.

Claudio Martella (JIRA) Thu, 11 Apr 2013 23:51:22 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629861#comment-13629861
 ]


Claudio Martella commented on GIRAPH-616:
-----------------------------------------

given the nice results, out of curiosity, i ran the same benchmark with ALSO 
OOC messages. basically in these tests we are currently keeping in memory 2 
partitions out of the 60 assigned to each worker (3%), and each worker produces 
(and receives) on average 83M messages per superstep (5B edges / 60 workers). 
so I ran the tests adding  giraph.maxMessagesInMemory=2490000, which is 3%. the 
results follow. I would be curious to see how long it would take to run the 
same number of iterations on the same graph with the same number of tasks with 
MR.

13/04/12 08:45:43 INFO mapred.JobClient:     Total (milliseconds)=2132200
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 3 (milliseconds)=205329
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 4 (milliseconds)=198965
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 10 (milliseconds)=109850
13/04/12 08:45:43 INFO mapred.JobClient:     Setup (milliseconds)=25407
13/04/12 08:45:43 INFO mapred.JobClient:     Shutdown (milliseconds)=83
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 7 (milliseconds)=200026
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 9 (milliseconds)=203015
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 0 (milliseconds)=110034
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 8 (milliseconds)=200514
13/04/12 08:45:43 INFO mapred.JobClient:     Input superstep 
(milliseconds)=40560
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 6 (milliseconds)=204376
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 5 (milliseconds)=199704
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 2 (milliseconds)=204082
13/04/12 08:45:43 INFO mapred.JobClient:     Superstep 1 (milliseconds)=230250

The results are not bad, as each superstep last around 3 times more. but if you 
think that we're keeping in memory less than 10MB of messages per worker 
(considering a 4bytes float), it is quite understandable given the scale. I 
think that overall, we keep less stuff in memory than the default buffers of MR 
(for sorting and for io). i'd like to test some merging of the 
diskbackedmessagestore files in the background, to see if reducing files and 
disk seeks can make a difference (but isn't it overall as much I/O during the 
merge as we would now?).
                
> Decouple vertices and edges in DiskBackedPartitionStore and avoid writing 
> back edges when the algorithm does not change topology.
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-616
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-616
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Claudio Martella
>            Assignee: Claudio Martella
>         Attachments: GIRAPH-616.diff, GIRAPH-616.diff
>
>
> Many algorithms work on a static graph. In these cases, when running 
> out-of-core graph we end up writing back the edges that have not changed 
> since we read them. By decoupling vertices and edges, we can write back only 
> the freshly computed vertex values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-616) Decouple vertices and edges in DiskBackedPartitionStore and avoid writing back edges when the algorithm does not change topology.

Reply via email to