Craig Muchinsky created GIRAPH-800:
--------------------------------------

             Summary: Resolving mutations on a large graph causes timeouts
                 Key: GIRAPH-800
                 URL: https://issues.apache.org/jira/browse/GIRAPH-800
             Project: Giraph
          Issue Type: Bug
          Components: graph
    Affects Versions: 1.1.0
         Environment: hadoop1
            Reporter: Craig Muchinsky


When processing a graph with a large number of mutations and/or a large number 
of messages per superstep, the pre-superstep logic can appear to be hung up and 
eventually the graph times out either because of mapreduce task inactivity or 
hitting the max superstep wait.

While its possible to tune around this by adding a strategic call to 
context.progress() in NettyServerWorker.resolveMutations() and bumping up the 
giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the code 
might need some optimization.

As an example, in a graph with 2B vertices and 2.5B edges the transition 
between supersteps with 1B messages in flight can take 15-30 minutes on a 
cluster with 228 workers (2 threads, 8GB RAM per worker).

While the vertex resolve processing can be time consuming, I believe its the 
check for missing vertices (second loop within 
NettyServerWorker.resolveMutations()) that is the real performance bottleneck. 
I haven't identified a fix to this logic as of yet, but I did identify a 
possible workaround. I believe when dealing with a static and complete graph 
the resolveMutations() call can be skipped all together. A quick test of this 
theory yielded a 3x performance improvement in my sandbox.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to