Re: Hadoop RPC as a default

Thomas Jungblut Wed, 19 Sep 2012 01:54:23 -0700

I will give you more details what I planned on the interface changes once
I'm back from my lecture.


2012/9/19 Suraj Menon <[email protected]>

> As a beginning we should have a spilling queue and the same with combiner
> running in batch if possible.
> I have been looking into implementing the spilling queue. Chalking out the
> requirements, we should look into the following:
>
> A queue should persist all the data if required by the framework for fault
> tolerance. ( I feel it would be a bad idea for framework to spend resource
> on making a separate copy of the file )
> Asynchronous communication is our next important feature required for
> performance.Hence we would need this queue with combiner on sender side to
> batch the messages before sending. This implies we need to support both
> concurrent reads and writes.
>
> -Suraj
>
> On Wed, Sep 19, 2012 at 4:21 AM, Thomas Jungblut
> <[email protected]>wrote:
>
> > Oh okay, very interesting. Just another argument for making the messaging
> > more scalable ;)
> >
> > 2012/9/19 Edward J. Yoon <[email protected]>
> >
> > > Didn't check memory usage because each machine's memory is 48 GB, but I
> > > guess there's no big difference.
> > >
> > > In short, "bin/hama bench 16 10000 32" was maximum capacity (See [1]).
> If
> > > message numbers or nodes are increased, job is always fails. Hadoop RPC
> > is
> > > OK.
> > >
> > > Will need time to debug this.
> > >
> > > 1. http://wiki.apache.org/hama/**Benchmarks#Random_**
> > > Communication_Benchmark<
> > http://wiki.apache.org/hama/Benchmarks#Random_Communication_Benchmark>
> > >
> > > On 9/19/2012 4:34 PM, Thomas Jungblut wrote:
> > >
> > >> BTW after HAMA-642<https://issues.**apache.org/jira/browse/HAMA-**642
> <
> > https://issues.apache.org/jira/browse/HAMA-642>>
> > >>  I will
> > >>
> > >> redesign our messaging system to being completely disk based with
> > caching.
> > >> I will formulate a followup issue for this. However I plan to get rid
> of
> > >> the RPC anyway, I think it is more efficient to stream the messages
> from
> > >> disk over network to the other host via NIO (we can later replace it
> > with
> > >> netty). Also this saves us the time to do the checkpointing, because
> > this
> > >> can be combined with it pretty well. RPC requires the whole bundle to
> be
> > >> in
> > >> RAM, which is totally bad.
> > >> Will follow with more details later.
> > >>
> > >> 2012/9/19 Thomas Jungblut<thomas.jungblut@**gmail.com<
> > [email protected]>
> > >> >:
> > >>
> > >>> What is more memory efficient?
> > >>>
> > >>> Am 19.09.2012 08:23 schrieb "Edward J. Yoon"<[email protected]
> >:
> > >>>
> > >>>  Let's change the default value of RPC in hama-default.xml to Hadoop
> > RPC.
> > >>>>
> > >>> I
> > >>
> > >>> am testing Hadoop RPC and Avro RPC on 4 racks cluster. Avro RPC is
> > >>>>
> > >>> criminal.
> > >>
> > >>> There's no significant performance difference.
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>>
> > >>>>
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> > >
> >
>

Re: Hadoop RPC as a default

Reply via email to