Avery Ching commented on GIRAPH-12:

If the default stack size is 1 MB, then for instance if you have 1024 workers, 
you are talking about 1 GB just wasted for thread stack space per node.  The 
aggregate wasted memory would be 1 GB * 1024 = 1 TB, that's a lot of memory =).

The issue is that many clusters (including Yahoo!'s) have are running only 
32-bit JVMs.  So if you are using 1 GB just for stack space, you only get so 
much left for heap (graph + messages).  I think this should help quite a bit 
until GIRAPH-37 is taken on. 

Can you run the unittests against a real Hadoop instance as well?  Then I'd say 
+1, unless someone disagrees.
> Investigate communication improvements
> --------------------------------------
>                 Key: GIRAPH-12
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-12
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Avery Ching
>            Assignee: Hyunsik Choi
>            Priority: Minor
>         Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to