[ https://issues.apache.org/jira/browse/HAMA-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925637#comment-13925637 ]
Anastasis Andronidis commented on HAMA-629: ------------------------------------------- Hi, as this issue concerns only the aggregators on Graph API, I would say that maybe the best solution is to make the aggregation framework work like a d-heap[1]. I would be a scalable solution, to make each BSP peer send its aggregation result to a middle-peer (a proxy peer) and this proxy peer compute its local results with the received. Then, the proxy peer will send the results to an other peer, higher, until all messages reach out the master aggregator. We can pre-define the levels of the heap, and we can easily map the structure of the d-heap to the peers by the array form of the d-heap (the peers are put in an array, so by using the [2] we can do the mapping) The only problem I see in this solution is that for each level of the heap/tree we need to declare a superstep. We need a good tuning to make this fast. [1]: http://en.wikipedia.org/wiki/D-ary_heap [2]: http://en.wikipedia.org/wiki/K-ary_tree#Arrays > Improve RPC Scalability Part 2 > ------------------------------ > > Key: HAMA-629 > URL: https://issues.apache.org/jira/browse/HAMA-629 > Project: Hama > Issue Type: Sub-task > Components: graph > Affects Versions: 0.5.0 > Reporter: Thomas Jungblut > Fix For: 0.7.0 > > > There is a problem when all 1k peers would attempt to send to a single peer > (let's say a master task in a graph algorithm that aggregates). In this case > the peer will start 1k-threads which is using enourmous amount of memory. > I think we can coordinate the message sending either with Zookeeper or by > using the task id and do a smarter sending chain. > By the last, I mean, that each task can start at a different offset in the > peer array to start sending messages to the other peers. But this won't solve > the problem DDoS'ing a single master task. -- This message was sent by Atlassian JIRA (v6.2#6252)