If I remember correctly, in hadoop's case, MR framework merges and sorts intermediate data files by key between map and reduce functions. If we provide this function, I think we can solve disk queue, message-grouping and message-sort at once.
BTW, can we specify the queue type per job? On Thu, Jan 31, 2013 at 4:20 PM, Suraj Menon <[email protected]> wrote: > Thanks for bringing up our discussion online. > > For 1. Let's implement something withing bsp-core that could be re-used by > graph package. [HAMA-724] > > For 2. For sorted queue, It would be expensive to do all the sorting on the > sender side. We need to have a send protocol and the receive protocol > (merge sort) [HAMA-722][HAMA-723] > > Regards, > Suraj > > On Wed, Jan 30, 2013 at 3:05 AM, Edward J. Yoon <[email protected]>wrote: > >> Hi devs, >> >> As you know, many people reports OOM problems with graph algorithms. >> It is about handling messages. I roughly think that every vertex can >> send or receive as many messages as the number of outgoing or incoming >> links. For example, you know, Barack Obama has an 26,000,000+ >> followers. >> >> I believe the issue of message queue will be fixed by adding spilling >> queue. Another issue is the grouping messages by vertex ID[1]. To >> solve this issue, I'm thinking about two ways: 1) Support grouping >> function of key-value pair messages in BSP framework (like >> Map/Reduce). 2) Write messages and Sort by vertex ID on local disk >> (external merge sort). >> >> If you have any ideas or suggestions, Pls let me know. >> >> 1. https://issues.apache.org/jira/browse/HAMA-704 >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
