I'd bet on the Linux file-cache. Assuming you wrote the file with the default replication factor of 3, there is one replica of the local- filesystem which you are reading...

Try writing multiple GBs of data and randomly reading large files to blow your file-cache?

Arun

On Jun 11, 2010, at 10:05 AM, Patrick Donnelly wrote:

Hi List,

I need to explain an higher than expected throughput (bandwidth) for a
HDFS C API Client. Specifically, the client is getting bandwidth
higher than its link rate :). The client is first writing a 512 MB
file followed by reading the entire file back. The file read is what's
getting the higher than link rate bandwidth. I assume this is a
consequence of caching? Is this done by HDFS or by Linux?

Thanks for any help,

--
- Patrick Donnelly

Reply via email to