Re: Hadoop RPC as a default

Edward J. Yoon Wed, 19 Sep 2012 03:21:26 -0700

P.S., Since memory issue of graph job will be fixed by Thomas's
HAMA-642, I'll remove my dirty multi-step partitioning code in graph
module if there's no problem w/ Hadoop RPC tomorrow.


On Wed, Sep 19, 2012 at 5:53 PM, Thomas Jungblut
<[email protected]> wrote:
> I will give you more details what I planned on the interface changes once
> I'm back from my lecture.
>
> 2012/9/19 Suraj Menon <[email protected]>
>
>> As a beginning we should have a spilling queue and the same with combiner
>> running in batch if possible.
>> I have been looking into implementing the spilling queue. Chalking out the
>> requirements, we should look into the following:
>>
>> A queue should persist all the data if required by the framework for fault
>> tolerance. ( I feel it would be a bad idea for framework to spend resource
>> on making a separate copy of the file )
>> Asynchronous communication is our next important feature required for
>> performance.Hence we would need this queue with combiner on sender side to
>> batch the messages before sending. This implies we need to support both
>> concurrent reads and writes.
>>
>> -Suraj
>>
>> On Wed, Sep 19, 2012 at 4:21 AM, Thomas Jungblut
>> <[email protected]>wrote:
>>
>> > Oh okay, very interesting. Just another argument for making the messaging
>> > more scalable ;)
>> >
>> > 2012/9/19 Edward J. Yoon <[email protected]>
>> >
>> > > Didn't check memory usage because each machine's memory is 48 GB, but I
>> > > guess there's no big difference.
>> > >
>> > > In short, "bin/hama bench 16 10000 32" was maximum capacity (See [1]).
>> If
>> > > message numbers or nodes are increased, job is always fails. Hadoop RPC
>> > is
>> > > OK.
>> > >
>> > > Will need time to debug this.
>> > >
>> > > 1. http://wiki.apache.org/hama/**Benchmarks#Random_**
>> > > Communication_Benchmark<
>> > http://wiki.apache.org/hama/Benchmarks#Random_Communication_Benchmark>
>> > >
>> > > On 9/19/2012 4:34 PM, Thomas Jungblut wrote:
>> > >
>> > >> BTW after HAMA-642<https://issues.**apache.org/jira/browse/HAMA-**642
>> <
>> > https://issues.apache.org/jira/browse/HAMA-642>>
>> > >>  I will
>> > >>
>> > >> redesign our messaging system to being completely disk based with
>> > caching.
>> > >> I will formulate a followup issue for this. However I plan to get rid
>> of
>> > >> the RPC anyway, I think it is more efficient to stream the messages
>> from
>> > >> disk over network to the other host via NIO (we can later replace it
>> > with
>> > >> netty). Also this saves us the time to do the checkpointing, because
>> > this
>> > >> can be combined with it pretty well. RPC requires the whole bundle to
>> be
>> > >> in
>> > >> RAM, which is totally bad.
>> > >> Will follow with more details later.
>> > >>
>> > >> 2012/9/19 Thomas Jungblut<thomas.jungblut@**gmail.com<
>> > [email protected]>
>> > >> >:
>> > >>
>> > >>> What is more memory efficient?
>> > >>>
>> > >>> Am 19.09.2012 08:23 schrieb "Edward J. Yoon"<[email protected]
>> >:
>> > >>>
>> > >>>  Let's change the default value of RPC in hama-default.xml to Hadoop
>> > RPC.
>> > >>>>
>> > >>> I
>> > >>
>> > >>> am testing Hadoop RPC and Avro RPC on 4 racks cluster. Avro RPC is
>> > >>>>
>> > >>> criminal.
>> > >>
>> > >>> There's no significant performance difference.
>> > >>>>
>> > >>>> --
>> > >>>> Best Regards, Edward J. Yoon
>> > >>>> @eddieyoon
>> > >>>>
>> > >>>>
>> > > --
>> > > Best Regards, Edward J. Yoon
>> > > @eddieyoon
>> > >
>> > >
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Hadoop RPC as a default

Reply via email to