Li Bo created HDFS-9826: --------------------------- Summary: Erasure Coding: Postpone the recovery work for a configurable time period Key: HDFS-9826 URL: https://issues.apache.org/jira/browse/HDFS-9826 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo
Currently NameNode prepares recovering when finding an under replicated block group. This is inefficient and reduces resources for other operations. It would be better to postpone the recovery work for a period of time if only one internal block is corrupted considering points shown by papers such as \[1\]\[2\]: 1. Transient errors in which no data are lost account for more than 90% of data center failures, owing to network partitions, software problems, or non-disk hardware faults. 2. Although erasure codes tolerate multiple simultaneous failures, single failures represent 99.75% of recoveries. Different clusters may have different status, so we should allow user to configure the time for postponing the recoveries. Proper configuration will reduce a large proportion of unnecessary recoveries. When finding multiple internal blocks corrupted in a block group, we do the recovery work immediately because it’s very rare and we don’t want to increase the risk of losing data. [1] Availability in globally distributed storage systems http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf [2] Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)