Hyunsik Choi commented on GIRAPH-12:

I have thought about question 3. That is, how we can measure the memory usage 
while Giraph is running.

Probably, the most basic way is to use the hadoop metrics 
(http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs 
to change _hadoop-metrics.properties_ file. So, it may be restricted for most 
large clusters; e.g., Yahoo! cluster that Avery can access. 

If the above way is impossible, we can implement a thread class mimic to hadoop 
metric in order to measure the memory usage on JVM periodically and sends that 
to a specific remote server.

What do you think about that?

> Investigate communication improvements
> --------------------------------------
>                 Key: GIRAPH-12
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-12
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Avery Ching
>            Assignee: Hyunsik Choi
>            Priority: Minor
>         Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to