Re: Hadoop partitions Problem

Allen Wittenauer Mon, 08 Nov 2010 07:18:42 -0800

On Nov 8, 2010, at 6:42 AM, Sudhir Vallamkondu wrote:

> Just curious as to why this would happen. There are other posts that suggest
> that datanode is responsible to enforce round-robin write strategy among the
> various disks specified using the "dfs.data.dir" property



        HDFS only round-robins the first block.  From there, it is sequential.  
Also keep in mind that if the file system shares space with MapReduce, any time 
MR goes beyond the 'reserved' space it counts against HDFS.  [This likely 
happens more than people realize.]

         As files of varying sizes get added and removed, it does a decent job 
in the average case.   But there will eventually be outliers.

        Also keep in mind that unless the whole node is completely newfs'd, 
individual drive failures will cause one fs to be empty while the others have 
data.  There is no "catch up" mechanism in HDFS that will cause it to put more 
blocks on this newly empty drive.

Re: Hadoop partitions Problem

Reply via email to