Re: performace questions

Raghu Angadi Fri, 08 Jun 2007 19:09:18 -0700

Your interest is good. I think you should ask even smaller number ofquestions in one mail and try to do more experimentation.


Bwolen Yang wrote:

Here is a summary of my remaining questions from the [write and sort
performance] thread.

- Looks like every 5GB data I put into Hadoop DFS, it uses up ~18GB of
raw disk space (based on block counts exported from namenode).
Accounting for 3x replication, I was expecting 15GB. what's causing
this 20% overhead?

You are asuming each block is is 64M. There are some blocks for "CRCfiles". Did you try to du the datanode's 'data directories'?

- when large amount of data is written to HFS (for example
copyFromLocal), are the file block replication pipelined?  Also, does
one 64MB block needs to be fully replicated before the next 64MB copy
can start?

They are pipelined. Again you can experiment by trying with singlereplica (in config) and see if runs much faster. If it does not, thenthey should be pipelined.


Raghu.

Re: performace questions

Reply via email to