[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103422#comment-13103422
 ] 

Avery Ching commented on GIRAPH-12:
-----------------------------------

Hyunsik, just to update, I grabbed your patch and it passed unittest on my 
machine.  Then I ran it on a cluster at Yahoo!.  

I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark.  
I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex.

Here are 2 runs with the original code:

11/09/13 07:02:08 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:02:08 INFO mapred.JobClient:     Total (milliseconds)=46709
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1682
11/09/13 07:02:08 INFO mapred.JobClient:     Setup (milliseconds)=3228
11/09/13 07:02:08 INFO mapred.JobClient:     Shutdown (milliseconds)=1223
11/09/13 07:02:08 INFO mapred.JobClient:     Vertex input superstep 
(milliseconds)=3578
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 0 (milliseconds)=16222
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 2 (milliseconds)=12302
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 1 (milliseconds)=8467

13 07:14:51 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:14:51 INFO mapred.JobClient:     Total (milliseconds)=51475
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1348
11/09/13 07:14:51 INFO mapred.JobClient:     Setup (milliseconds)=7233
11/09/13 07:14:51 INFO mapred.JobClient:     Shutdown (milliseconds)=884
11/09/13 07:14:51 INFO mapred.JobClient:     Vertex input superstep 
(milliseconds)=3284
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 0 (milliseconds)=22213
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 2 (milliseconds)=8553
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 1 (milliseconds)=7955


Here are 2 runs with your code:

11/09/13 07:06:56 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:06:56 INFO mapred.JobClient:     Total (milliseconds)=51935
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1150
11/09/13 07:06:56 INFO mapred.JobClient:     Setup (milliseconds)=3338
11/09/13 07:06:56 INFO mapred.JobClient:     Shutdown (milliseconds)=833
11/09/13 07:06:56 INFO mapred.JobClient:     Vertex input superstep 
(milliseconds)=3401
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17297
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 2 (milliseconds)=14384
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 1 (milliseconds)=11528

11/09/13 07:12:09 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:12:09 INFO mapred.JobClient:     Total (milliseconds)=51985
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1362
11/09/13 07:12:09 INFO mapred.JobClient:     Setup (milliseconds)=3776
11/09/13 07:12:09 INFO mapred.JobClient:     Shutdown (milliseconds)=710
11/09/13 07:12:09 INFO mapred.JobClient:     Vertex input superstep 
(milliseconds)=3771
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17741
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 2 (milliseconds)=13068
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 1 (milliseconds)=11551

In my limited testing, numbers aren't too different.  I also see that the 
connections are maintained throughout the application run as you mentioned.  So 
the only tradeoff is possibly the reduced parallelization of message sending 
(user chosen vs all threads).  I like the approach and think it's an 
improvement (controllable threads).  Perhaps the only comment is that regarding 
the following code block.

for(PeerConnection pc : peerConnections.values()) {
    futures.add(executor.submit(new PeerFlushExecutor(pc)));            
}

Probably would be good to randomize the PeerConnection objects to avoid 
hotspots on the receiving side?


> Investigate communication improvements
> --------------------------------------
>
>                 Key: GIRAPH-12
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-12
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Avery Ching
>            Assignee: Hyunsik Choi
>            Priority: Minor
>         Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to