On Nov 8, 2010, at 6:42 AM, Sudhir Vallamkondu wrote:
> Just curious as to why this would happen. There are other posts that suggest
> that datanode is responsible to enforce round-robin write strategy among the
> various disks specified using the "dfs.data.dir" property
HDFS only round-robins the first block. From there, it is sequential.
Also keep in mind that if the file system shares space with MapReduce, any time
MR goes beyond the 'reserved' space it counts against HDFS. [This likely
happens more than people realize.]
As files of varying sizes get added and removed, it does a decent job
in the average case. But there will eventually be outliers.
Also keep in mind that unless the whole node is completely newfs'd,
individual drive failures will cause one fs to be empty while the others have
data. There is no "catch up" mechanism in HDFS that will cause it to put more
blocks on this newly empty drive.