Re: Hard Disk Failures

Raghu Angadi Wed, 01 Aug 2007 11:10:50 -0700

Also I should add, disks degrade and continue to work with badperformance for a long time, not just before beginning to fail. I waslooking at some benchmarks and could isolate some of those machines. SeeHADOOP-1649.

If DFS schedules less writes on such nodes, then it helps the wholecluster and would make speculative execution of tasks in map/reduce moreeffective.


Raghu.

Raghu Angadi wrote:

I did not watch large hadoop clusters closely but from my experience ofother large clusters that have heavy disk loads (seek dominated), thebehavior you see seems consistent. Some disks do become very slow and ifthey are on some raid, whole raid runs at the speed of the slowest disk.iostat -x also helps confirm this.
Also comparing ext2 and ext3, ext3 did not have noticeable slow down.Many times application access patterns tend dictate most of the diskperformance than the native filesystem implementation itself. Filesystemwould probably matter more when we are dealing with lot of small files.
Dennis Kubes wrote:
Can anyone who is running large clusters (50+) tell me what you areseeing with hard disk failure rates. Something that we are seeing isthat certain machines will consistently have double or triple the loadof other machines with the same tasks. I believe that it is due tosome hard disks beginning to fail, just wanted to know if anyone elseis seeing similar behavior?
Dennis Kubes

Re: Hard Disk Failures

Reply via email to