As a beginning we should have a spilling queue and the same with combiner running in batch if possible. I have been looking into implementing the spilling queue. Chalking out the requirements, we should look into the following:
A queue should persist all the data if required by the framework for fault tolerance. ( I feel it would be a bad idea for framework to spend resource on making a separate copy of the file ) Asynchronous communication is our next important feature required for performance.Hence we would need this queue with combiner on sender side to batch the messages before sending. This implies we need to support both concurrent reads and writes. -Suraj On Wed, Sep 19, 2012 at 4:21 AM, Thomas Jungblut <[email protected]>wrote: > Oh okay, very interesting. Just another argument for making the messaging > more scalable ;) > > 2012/9/19 Edward J. Yoon <[email protected]> > > > Didn't check memory usage because each machine's memory is 48 GB, but I > > guess there's no big difference. > > > > In short, "bin/hama bench 16 10000 32" was maximum capacity (See [1]). If > > message numbers or nodes are increased, job is always fails. Hadoop RPC > is > > OK. > > > > Will need time to debug this. > > > > 1. http://wiki.apache.org/hama/**Benchmarks#Random_** > > Communication_Benchmark< > http://wiki.apache.org/hama/Benchmarks#Random_Communication_Benchmark> > > > > On 9/19/2012 4:34 PM, Thomas Jungblut wrote: > > > >> BTW after HAMA-642<https://issues.**apache.org/jira/browse/HAMA-**642< > https://issues.apache.org/jira/browse/HAMA-642>> > >> I will > >> > >> redesign our messaging system to being completely disk based with > caching. > >> I will formulate a followup issue for this. However I plan to get rid of > >> the RPC anyway, I think it is more efficient to stream the messages from > >> disk over network to the other host via NIO (we can later replace it > with > >> netty). Also this saves us the time to do the checkpointing, because > this > >> can be combined with it pretty well. RPC requires the whole bundle to be > >> in > >> RAM, which is totally bad. > >> Will follow with more details later. > >> > >> 2012/9/19 Thomas Jungblut<thomas.jungblut@**gmail.com< > [email protected]> > >> >: > >> > >>> What is more memory efficient? > >>> > >>> Am 19.09.2012 08:23 schrieb "Edward J. Yoon"<[email protected]>: > >>> > >>> Let's change the default value of RPC in hama-default.xml to Hadoop > RPC. > >>>> > >>> I > >> > >>> am testing Hadoop RPC and Avro RPC on 4 racks cluster. Avro RPC is > >>>> > >>> criminal. > >> > >>> There's no significant performance difference. > >>>> > >>>> -- > >>>> Best Regards, Edward J. Yoon > >>>> @eddieyoon > >>>> > >>>> > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > > > >
