[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114709#comment-13114709
]
Hyunsik Choi commented on GIRAPH-12:
------------------------------------
I have thought about question 3. That is, how we can measure the memory usage
while Giraph is running.
Probably, the most basic way is to use the hadoop metrics
(http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs
to change _hadoop-metrics.properties_ file. So, it may be restricted for most
large clusters; e.g., Yahoo! cluster that Avery can access.
If the above way is impossible, we can implement a thread class mimic to hadoop
metric in order to measure the memory usage on JVM periodically and sends that
to a specific remote server.
What do you think about that?
> Investigate communication improvements
> --------------------------------------
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Reporter: Avery Ching
> Assignee: Hyunsik Choi
> Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other
> workers. Hadoop RPC is used for communication. For instance if there are
> 400 workers, each worker will create 400 threads. This ends up using a lot
> of memory, even with the option
> -Dmapred.child.java.opts="-Xss64k".
> It would be good to investigate using frameworks like Netty or custom roll
> our own to improve this situation. By moving away from Hadoop RPC, we would
> also make compatibility of different Hadoop versions easier.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira