Re: Large data sets

Doug Cutting Tue, 06 Feb 2007 12:02:13 -0800

Konstantin Shvachko wrote:

200 bytes per file is theoretically correct, but rather optimistic :-(
From a real system memory utilization I can see that HDFS uses 1.5-2Kper file.And since each real file is internally represented by two files (1 real+ 1 crc) the real
estimate per file should read 3-4K.

But also note that there are plans to address these over the comingmonths. For a start:


https://issues.apache.org/jira/browse/HADOOP-803
https://issues.apache.org/jira/browse/HADOOP-928

Once checksums are optional then we can replace their implementation inHDFS with something that does not consume namespace.


Long term we hope to approach ~100 bytes per file.

Doug

Re: Large data sets

Reply via email to