[ 
https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509974
 ] 

Doug Cutting commented on HADOOP-1557:
--------------------------------------

> a periodic disk block validation by the Datanode might be handy in detecting 
> these types of problems

Yes, it would, especially if the filesystem has been idle or offline for a 
time.  But for an actively used filesystem, normal use might identify failing 
drives as effectively.  Scanning the research on disk failures, it looks like 
they more frequently return a read error rather than corrupt data.  Currently, 
it looks like a datanode shuts down when it encounters a read error, which is 
probably sufficient.  The OS shouldn't return a read error unless it has 
retried several times, the drives ECC has failed, etc.

> Deletion of excess replicas should prefer to delete corrupted replicas before 
> deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If 
> the replication factor of the file is reduced to 2. The filesystem should 
> preferably delete the two corrupted replicas, otherwise it could lead to a 
> corrupted file.
> One option would be to make the datanode periodically validate all blocks 
> with their corresponding CRCs. The other option would be to make the 
> setReplication call validate existing replicas before deleting excess 
> replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to