i have a bunch of datanodes with several disks each, and i noticed that sometimes dfs blocks don't get evenly distributed among them. for instance, one of my machines has 5 disks with 500 gb each, and 1 disk with 2 TB (6 total disks). the 5 smaller disks are each 98% full, whereas the larger one is only 12% full. it seems as though dfs should do better by putting more of the blocks on the larger disk first. and mapreduce jobs are failing on this machine with error "java.io.IOException: No space left on device".
any thoughts or suggestions? thanks in advance. -- permanent contact information at http://mikerandrews.com