I'd bet on the Linux file-cache. Assuming you wrote the file with the
default replication factor of 3, there is one replica of the local-
filesystem which you are reading...
Try writing multiple GBs of data and randomly reading large files to
blow your file-cache?
Arun
On Jun 11, 2010, at 10:05 AM, Patrick Donnelly wrote:
Hi List,
I need to explain an higher than expected throughput (bandwidth) for a
HDFS C API Client. Specifically, the client is getting bandwidth
higher than its link rate :). The client is first writing a 512 MB
file followed by reading the entire file back. The file read is what's
getting the higher than link rate bandwidth. I assume this is a
consequence of caching? Is this done by HDFS or by Linux?
Thanks for any help,
--
- Patrick Donnelly