[
https://issues.apache.org/jira/browse/HAMA-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490796#comment-13490796
]
Thomas Jungblut commented on HAMA-629:
--------------------------------------
I'm reading into Erlang these days and they deal with this problem very
interestingly (actually very simple and naive).
They are building a hierachy of nodes in your cluster. A subcluster contains 10
nodes (maybe the number of nodes in a rack?), they are completely
interconnected with each other, so the are holding a RPC or socket connection
the whole time.
If messages need to be send outside the subcluster, they are sending it to an
aggregator (or multiple aggregators) and the aggregators will send to the real
destination.
That's how they archieve scalability, certainly a lot of TCP connections are a
big problem there as well.
> Improve RPC Scalability Part 2
> ------------------------------
>
> Key: HAMA-629
> URL: https://issues.apache.org/jira/browse/HAMA-629
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core, messaging
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
>
> There is a problem when all 1k peers would attempt to send to a single peer
> (let's say a master task in a graph algorithm that aggregates). In this case
> the peer will start 1k-threads which is using enourmous amount of memory.
> I think we can coordinate the message sending either with Zookeeper or by
> using the task id and do a smarter sending chain.
> By the last, I mean, that each task can start at a different offset in the
> peer array to start sending messages to the other peers. But this won't solve
> the problem DDoS'ing a single master task.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira