Re: Hadoop RPC as a default

Edward J. Yoon Tue, 25 Sep 2012 18:47:00 -0700

P.S., ignore this. +      data = null;

On Wed, Sep 26, 2012 at 10:16 AM, Edward J. Yoon <[email protected]> wrote:
> FYI, the problem is fixed by below small patch.
>
> Index: src/main/java/org/apache/hama/bsp/message/AvroMessageManagerImpl.java
> ===================================================================
> --- src/main/java/org/apache/hama/bsp/message/AvroMessageManagerImpl.java     
>   (revision
> 1389695)
> +++ src/main/java/org/apache/hama/bsp/message/AvroMessageManagerImpl.java     
>   (working
> copy)
> @@ -135,12 +135,15 @@
>        ByteArrayInputStream inArray = new ByteArrayInputStream(byteArray);
>        DataInputStream in = new DataInputStream(inArray);
>        msg.readFields(in);
> +      in.close();
> +      inArray.close();
>      } else {
>        
> peer.incrementCounter(BSPPeerImpl.PeerCounter.COMPRESSED_BYTES_RECEIVED,
>            byteArray.length);
>        msg = compressor.decompressBundle(new BSPCompressedBundle(byteArray));
>      }
>
> +    byteArray = null;
>      return msg;
>    }
>
> @@ -154,12 +157,14 @@
>        byte[] byteArray = outArray.toByteArray();
>        peer.incrementCounter(BSPPeerImpl.PeerCounter.MESSAGE_BYTES_TRANSFERED,
>            byteArray.length);
> +      outArray.close();
>        return ByteBuffer.wrap(byteArray);
>      } else {
>        BSPCompressedBundle compMsgBundle = compressor.compressBundle(msg);
>        byte[] data = compMsgBundle.getData();
>        peer.incrementCounter(BSPPeerImpl.PeerCounter.COMPRESSED_BYTES_SENT,
>            data.length);
> +      data = null;
>        return ByteBuffer.wrap(data);
>      }
>    }
>
> On Thu, Sep 20, 2012 at 4:22 PM, Edward J. Yoon <[email protected]> wrote:
>> It seems we didn't test 0.5 release thoroughly.
>>
>> Let's give ourselves more time.
>>
>> On Wed, Sep 19, 2012 at 8:00 PM, Thomas Jungblut
>> <[email protected]> wrote:
>>> Wait, the problem had multiple roots. We just fixed a few. I will sit down
>>> and take a look at the messaging.
>>> If this is scalable we can remove the multi step partitioning.
>>> Am 19.09.2012 12:21 schrieb "Edward J. Yoon" <[email protected]>:
>>>
>>>> P.S., Since memory issue of graph job will be fixed by Thomas's
>>>> HAMA-642, I'll remove my dirty multi-step partitioning code in graph
>>>> module if there's no problem w/ Hadoop RPC tomorrow.
>>>>
>>>> On Wed, Sep 19, 2012 at 5:53 PM, Thomas Jungblut
>>>> <[email protected]> wrote:
>>>> > I will give you more details what I planned on the interface changes once
>>>> > I'm back from my lecture.
>>>> >
>>>> > 2012/9/19 Suraj Menon <[email protected]>
>>>> >
>>>> >> As a beginning we should have a spilling queue and the same with
>>>> combiner
>>>> >> running in batch if possible.
>>>> >> I have been looking into implementing the spilling queue. Chalking out
>>>> the
>>>> >> requirements, we should look into the following:
>>>> >>
>>>> >> A queue should persist all the data if required by the framework for
>>>> fault
>>>> >> tolerance. ( I feel it would be a bad idea for framework to spend
>>>> resource
>>>> >> on making a separate copy of the file )
>>>> >> Asynchronous communication is our next important feature required for
>>>> >> performance.Hence we would need this queue with combiner on sender side
>>>> to
>>>> >> batch the messages before sending. This implies we need to support both
>>>> >> concurrent reads and writes.
>>>> >>
>>>> >> -Suraj
>>>> >>
>>>> >> On Wed, Sep 19, 2012 at 4:21 AM, Thomas Jungblut
>>>> >> <[email protected]>wrote:
>>>> >>
>>>> >> > Oh okay, very interesting. Just another argument for making the
>>>> messaging
>>>> >> > more scalable ;)
>>>> >> >
>>>> >> > 2012/9/19 Edward J. Yoon <[email protected]>
>>>> >> >
>>>> >> > > Didn't check memory usage because each machine's memory is 48 GB,
>>>> but I
>>>> >> > > guess there's no big difference.
>>>> >> > >
>>>> >> > > In short, "bin/hama bench 16 10000 32" was maximum capacity (See
>>>> [1]).
>>>> >> If
>>>> >> > > message numbers or nodes are increased, job is always fails. Hadoop
>>>> RPC
>>>> >> > is
>>>> >> > > OK.
>>>> >> > >
>>>> >> > > Will need time to debug this.
>>>> >> > >
>>>> >> > > 1. http://wiki.apache.org/hama/**Benchmarks#Random_**
>>>> >> > > Communication_Benchmark<
>>>> >> > http://wiki.apache.org/hama/Benchmarks#Random_Communication_Benchmark
>>>> >
>>>> >> > >
>>>> >> > > On 9/19/2012 4:34 PM, Thomas Jungblut wrote:
>>>> >> > >
>>>> >> > >> BTW after HAMA-642<https://issues.**
>>>> apache.org/jira/browse/HAMA-**642
>>>> >> <
>>>> >> > https://issues.apache.org/jira/browse/HAMA-642>>
>>>> >> > >>  I will
>>>> >> > >>
>>>> >> > >> redesign our messaging system to being completely disk based with
>>>> >> > caching.
>>>> >> > >> I will formulate a followup issue for this. However I plan to get
>>>> rid
>>>> >> of
>>>> >> > >> the RPC anyway, I think it is more efficient to stream the messages
>>>> >> from
>>>> >> > >> disk over network to the other host via NIO (we can later replace
>>>> it
>>>> >> > with
>>>> >> > >> netty). Also this saves us the time to do the checkpointing,
>>>> because
>>>> >> > this
>>>> >> > >> can be combined with it pretty well. RPC requires the whole bundle
>>>> to
>>>> >> be
>>>> >> > >> in
>>>> >> > >> RAM, which is totally bad.
>>>> >> > >> Will follow with more details later.
>>>> >> > >>
>>>> >> > >> 2012/9/19 Thomas Jungblut<thomas.jungblut@**gmail.com<
>>>> >> > [email protected]>
>>>> >> > >> >:
>>>> >> > >>
>>>> >> > >>> What is more memory efficient?
>>>> >> > >>>
>>>> >> > >>> Am 19.09.2012 08:23 schrieb "Edward J. Yoon"<
>>>> [email protected]
>>>> >> >:
>>>> >> > >>>
>>>> >> > >>>  Let's change the default value of RPC in hama-default.xml to
>>>> Hadoop
>>>> >> > RPC.
>>>> >> > >>>>
>>>> >> > >>> I
>>>> >> > >>
>>>> >> > >>> am testing Hadoop RPC and Avro RPC on 4 racks cluster. Avro RPC is
>>>> >> > >>>>
>>>> >> > >>> criminal.
>>>> >> > >>
>>>> >> > >>> There's no significant performance difference.
>>>> >> > >>>>
>>>> >> > >>>> --
>>>> >> > >>>> Best Regards, Edward J. Yoon
>>>> >> > >>>> @eddieyoon
>>>> >> > >>>>
>>>> >> > >>>>
>>>> >> > > --
>>>> >> > > Best Regards, Edward J. Yoon
>>>> >> > > @eddieyoon
>>>> >> > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon




-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Hadoop RPC as a default

Reply via email to