[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116253#comment-13116253 ]
Dmitriy V. Ryaboy commented on GIRAPH-12: ----------------------------------------- Julien has a nice post describing how one goes about detecting low memory conditions: https://techblug.wordpress.com/2011/07/21/detecting-low-memory-in-java-part-2/ . The first thing to do when this happens is probably to run through combiners to attempt to free some memory. Assuming we still need to do something with the messages, there are two approaches that come to mind: 1) Spill to disk and keep track of spilled messages. This is going to cost us, but it'll make it possible to make progress when otherwise an OOM error would occur. 2) Send the messages to intended recipients instead of spilling to disk. That will be speedier, but does run the risk of the other side being out of memory and unable to accumulate, too. Either way, this ticket is more about reworking the communication code than about memory improvements -- we haven't measured how much memory individual threads are taking up, but I am betting their impact is dwarfed by buffered messages and the in-memory graph segment we are working on, so it would be surprising if we could substantially reduce the amount of memory by simply switching to thread pools. Let's open a separate JIRA to deal with message accumulation better, and consider this code on merits other than memory footprint. > Investigate communication improvements > -------------------------------------- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp > Reporter: Avery Ching > Assignee: Hyunsik Choi > Priority: Minor > Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira