[ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509974 ]
Doug Cutting commented on HADOOP-1557: -------------------------------------- > a periodic disk block validation by the Datanode might be handy in detecting > these types of problems Yes, it would, especially if the filesystem has been idle or offline for a time. But for an actively used filesystem, normal use might identify failing drives as effectively. Scanning the research on disk failures, it looks like they more frequently return a read error rather than corrupt data. Currently, it looks like a datanode shuts down when it encounters a read error, which is probably sufficient. The OS shouldn't return a read error unless it has retried several times, the drives ECC has failed, etc. > Deletion of excess replicas should prefer to delete corrupted replicas before > deleting valid replicas > ----------------------------------------------------------------------------------------------------- > > Key: HADOOP-1557 > URL: https://issues.apache.org/jira/browse/HADOOP-1557 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > > Suppose a block has three replicas and two of the replicas are corrupted. If > the replication factor of the file is reduced to 2. The filesystem should > preferably delete the two corrupted replicas, otherwise it could lead to a > corrupted file. > One option would be to make the datanode periodically validate all blocks > with their corresponding CRCs. The other option would be to make the > setReplication call validate existing replicas before deleting excess > replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.