[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated GIRAPH-12:
-------------------------------
Attachment: GIRAPH-12_1.patch
As Avery mentioned, in the current architecture, each worker requires N threads
that communicate with N remote peers. This may incur severe context-switching
overheads (especially when all messages are flushed) and more memory
consumption. Firstly, I considered about replacing RPC system to another one.
However, it is not simple work. I need more time.
Instead, I have considered an alternative way to employ ThreadPoolExecutor in
order to adjust active threads. When Giraph deals with large graphs, the
performance of Giraph is usually bounded on network bandwidth. I think that
this approach would be effective. In addition, I tried to reduce the
synchronization area, where BasicRPCCommunicator (374-394 lines) sends large
buffered messages to specific peers.
I attached the patch in progress. Now, I cannot access to real hadoop cluster
for one week. I didn't test this in real cluster. Besides, all unit test are
passed.
How about this approach? Could you review this?
> Investigate communication improvements
> --------------------------------------
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
> Issue Type: Improvement
> Reporter: Avery Ching
> Assignee: Hyunsik Choi
> Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other
> workers. Hadoop RPC is used for communication. For instance if there are
> 400 workers, each worker will create 400 threads. This ends up using a lot
> of memory, even with the option
> -Dmapred.child.java.opts="-Xss64k".
> It would be good to investigate using frameworks like Netty or custom roll
> our own to improve this situation. By moving away from Hadoop RPC, we would
> also make compatibility of different Hadoop versions easier.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira