Dmitriy V. Ryaboy commented on GIRAPH-12:
Julien has a nice post describing how one goes about detecting low memory
. The first thing to do when this happens is probably to run through combiners
to attempt to free some memory.
Assuming we still need to do something with the messages, there are two
approaches that come to mind:
1) Spill to disk and keep track of spilled messages. This is going to cost us,
but it'll make it possible to make progress when otherwise an OOM error would
2) Send the messages to intended recipients instead of spilling to disk. That
will be speedier, but does run the risk of the other side being out of memory
and unable to accumulate, too.
Either way, this ticket is more about reworking the communication code than
about memory improvements -- we haven't measured how much memory individual
threads are taking up, but I am betting their impact is dwarfed by buffered
messages and the in-memory graph segment we are working on, so it would be
surprising if we could substantially reduce the amount of memory by simply
switching to thread pools.
Let's open a separate JIRA to deal with message accumulation better, and
consider this code on merits other than memory footprint.
> Investigate communication improvements
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Reporter: Avery Ching
> Assignee: Hyunsik Choi
> Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
> Currently every worker will start up a thread to communicate with every other
> workers. Hadoop RPC is used for communication. For instance if there are
> 400 workers, each worker will create 400 threads. This ends up using a lot
> of memory, even with the option
> It would be good to investigate using frameworks like Netty or custom roll
> our own to improve this situation. By moving away from Hadoop RPC, we would
> also make compatibility of different Hadoop versions easier.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira