What about a patch to HDFS to reuse these buffers in a pool?
On Tue, Jul 29, 2014 at 10:08 AM, Stack <[email protected]> wrote: > You the man @liang xie. Let me try your suggestion here on my little test > bench. Lets get the below into refguide also.... > St.Ack > > > On Mon, Jul 28, 2014 at 8:03 PM, 谢良 <[email protected]> wrote: > > > The default dfs.client-write-packet-size value is 64k, at least it's in > my > > Hadoop2 env. > > I did a benchmark about i via ycsb loading 2 million records(3*200 > bytes): > > 1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s > > 2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s > > you see, it's about 40% benefit on gct:) > > It's because: in DFSOutputStream.Packet class, each "Create a new packet" > > operation, > > will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];", > > here "pktSize" comes from dfs.client-write-packet-size setting, and in > > HBase write scenario, > > we sync WAL asap, so all the new packets are very small > > (in my ycsb testing, most of them were only hundreds of bytes, or a few > > kilo bytes), > > rarely reached to 64k, so always allocating 64k array is just a waste. > > It would be better that if we add it to refguide note:) > > > > ps; 8k just a test setting, we should set it according the real kv size > > pattern. > > > > Thanks, > > > -- Todd Lipcon Software Engineer, Cloudera
