Re: performace questions

Raghu Angadi Sat, 09 Jun 2007 09:26:34 -0700

>  - 1 replica / 1 slave case writes at 15MB/sec.  This seems to point
> the performance problem to how datanode writes data (even to itself).

On Hadoop, most of the delay you are seeing for 1 replica test with onenode, is because of this: It first writes 64MB to local tmp file, thenit sends that 64MB file over (local) ethernet to DataNode on the samenode before starting to write next 64MB. Writing to tmp file and sendingto DataNode is *not* pipelined.

Disk b/w is not always equal to raw serial read/write bandwidth you geton a fresh partition with large disk (In fact 75MBps sounds pretty largewhat kind of disk is it? Is it a raid? or 10K rpm disk?)

I would suggest a simple exercise: Write 20GB file with dd as youinitially did that gave you 75 MBps. Now read this file and writeanother 20GB at the same time. Do you see 38MBps for each of read andwrite? You mostly won't. Where is the missing inefficiency? You couldrepeat this on a partition that is 80% full. There more factors thataffect disk performance other than raw serial read/write b/w. Mostimportant of them being disk seeks.

This is not Hadoop related and Hadoop inefficiencies are not necessarilyfor the same reason.

Also, 30MBps you tested your network is most likely limited by sshprocessing in scp than the b/w of the network. How can you confirm it?


Raghu.

Bwolen Yang wrote:

Raghu,

The 1 replica and "du" suggestions are good.  thank you.

To further reduce the variables, I also tried 1 replica/1 slave case.
(namenode and jobtracker are still on their own machines.)

- randomwriter:
 - 1 replica / 1 slave case writes at 15MB/sec.  This seems to point
the performance problem to how datanode writes data (even to itself).

 - 1 replica / 5 slave case's running time is 1/4th of 3 replica
case.  Perfect scaling would have been 1/3rd.  So, there is a 33%
additional performance overhead lost to replication (beyond writing 3x
as much data).

 - Looks like every 5GB data I put into Hadoop DFS, it uses up ~18GB....


Turned out there are a few blocks that are only a few k.  "du" is the
right tool.  The actual raw disk overhead only 1%.  thanks.

You are asuming each block is is 64M. There are some blocks for "CRC
files". Did you try to du the datanode's 'data directories'?


All blk_* files are 64MB or less.

However, some mappers still show it is accessing
               part-0:1006632960+70663780
where 70663780 is about 67MB.   Hmm... looks like it is only doing so
at the last block.  I guess that's not too bad.

They are pipelined.


you're right :).   the slowness exists even in single slave / single
replica case.

thanks

bwolen

Re: performace questions

Reply via email to