P.S., IMO, our future default queue should be a spilling queue, and message merge-sort function should be a optional.
On Thu, Jan 31, 2013 at 5:12 PM, Edward J. Yoon <[email protected]> wrote: > If I remember correctly, in hadoop's case, MR framework merges and > sorts intermediate data files by key between map and reduce functions. > If we provide this function, I think we can solve disk queue, > message-grouping and message-sort at once. > > BTW, can we specify the queue type per job? > > On Thu, Jan 31, 2013 at 4:20 PM, Suraj Menon <[email protected]> wrote: >> Thanks for bringing up our discussion online. >> >> For 1. Let's implement something withing bsp-core that could be re-used by >> graph package. [HAMA-724] >> >> For 2. For sorted queue, It would be expensive to do all the sorting on the >> sender side. We need to have a send protocol and the receive protocol >> (merge sort) [HAMA-722][HAMA-723] >> >> Regards, >> Suraj >> >> On Wed, Jan 30, 2013 at 3:05 AM, Edward J. Yoon <[email protected]>wrote: >> >>> Hi devs, >>> >>> As you know, many people reports OOM problems with graph algorithms. >>> It is about handling messages. I roughly think that every vertex can >>> send or receive as many messages as the number of outgoing or incoming >>> links. For example, you know, Barack Obama has an 26,000,000+ >>> followers. >>> >>> I believe the issue of message queue will be fixed by adding spilling >>> queue. Another issue is the grouping messages by vertex ID[1]. To >>> solve this issue, I'm thinking about two ways: 1) Support grouping >>> function of key-value pair messages in BSP framework (like >>> Map/Reduce). 2) Write messages and Sort by vertex ID on local disk >>> (external merge sort). >>> >>> If you have any ideas or suggestions, Pls let me know. >>> >>> 1. https://issues.apache.org/jira/browse/HAMA-704 >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
