What about a patch to HDFS to reuse these buffers in a pool?

On Tue, Jul 29, 2014 at 10:08 AM, Stack <[email protected]> wrote:

> You the man @liang xie.  Let me try your suggestion here on my little test
> bench.  Lets get the below into refguide also....
> St.Ack
>
>
> On Mon, Jul 28, 2014 at 8:03 PM, 谢良 <[email protected]> wrote:
>
> > The default dfs.client-write-packet-size value is 64k, at least it's in
> my
> > Hadoop2 env.
> > I did a benchmark about i via ycsb loading 2 million records(3*200
> bytes):
> > 1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s
> > 2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s
> > you see, it's about 40% benefit on gct:)
> > It's because: in DFSOutputStream.Packet class, each "Create a new packet"
> > operation,
> > will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];",
> > here "pktSize" comes from dfs.client-write-packet-size setting, and in
> > HBase write scenario,
> > we sync WAL asap, so all the new packets are very small
> > (in my ycsb testing, most of them were only hundreds of bytes, or a few
> > kilo bytes),
> > rarely reached to 64k, so always allocating 64k array is just a waste.
> > It would be better that if we add it to refguide note:)
> >
> > ps; 8k just a test setting, we should set it according the real kv size
> > pattern.
> >
> > Thanks,
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to