[
https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431653#comment-13431653
]
Thomas Jungblut commented on HAMA-593:
--------------------------------------
In both of the managers (Hadoop|Avro) the RPC connections to the other peers
are cached in a map called "peers".
This will always leave the connection open to every other task, imagine you
have 1k tasks, each task will hold 1k connections to each other (999 outband, 1
local).
1. The caching must be removed and the connections must be closed when the
messages were send.
Then, there is a problem when all 1k peers would attempt to send to a single
peer (let's say a master task in a graph algorithm that aggregates). In this
case the peer will start 1k-threads which is using enourmous amount of memory.
I wouldn't mind if this can be done smarter, but I have no solution at hand
currently.
Would be cool if 1. could be resolved :)
> Improve RPC scalability
> -----------------------
>
> Key: HAMA-593
> URL: https://issues.apache.org/jira/browse/HAMA-593
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core, messaging
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
>
> To improve scalability we can start a RPC connection after another instead of
> keeping all possible N connections open the whole time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira