Hi Hairong, I am using 0.20.2. And I set "dfs.write.packet.size" to : 512B, 32KB, 64KB, 256KB, 512KB, 2MB, 8MB and keep the BytesPerChecksum as 512B as default. And I got similar result as before.
I think the problem is on the packet size, which is the size of buffer for each write/read on the pipeline. any idea? BTW: if "dfs.write.packet.size" in 0.20.2 equals " dfs.client-write-packet-size " in 0.21 On Tue, Oct 12, 2010 at 3:44 AM, Hairong Kuang <kuang.hair...@gmail.com>wrote: > This might be caused by the default wirte packet size. In HDFS, user data > are pipeline to datanodes in packets. The default packet size is 64K. If the > chunksize is bigger than 64K, the packet size automatically adjusts to > include at least one chunk. > > Please set the packet size to be 8MB by configuring > dfs.client-write-packet-size (in trunk) and rerun your experiments. > > Hairong > > > On 10/8/10 9:42 PM, "elton sky" <eltonsky9...@gmail.com> wrote: > > Hello, > > I was benchmarking write/read of HDFS. > > I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file > with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB, > 8MB. > > The result surprised me. The performance for 512B, 32KB, 64KB are quite > similar, and then, as the increase of the bpc size the throughput decreases. > And comparing 512B to 8MB, there's a 40% to 50% difference in throughput. > > Is there any idea for this? > >