[ 
https://issues.apache.org/jira/browse/HDFS-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-387.
-----------------------------------------

    Resolution: Not A Problem

Closing.  Please feel free to reopen this if this is still a problem.

> Corrupted blocks leading to job failures
> ----------------------------------------
>
>                 Key: HDFS-387
>                 URL: https://issues.apache.org/jira/browse/HDFS-387
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks 
> (checksum errors) such that jobs were failing because of no live blocks 
> available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated 
> blocks are okay.
> Even better, the namenode should automatically check under-replicated blocks 
> with repeated replication failures for corruption and list them somewhere on 
> the GUI. And for checksum errors, there should be an option to undo the 
> corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have 
> checksum errors? If not, then we could reduce the checking to 
> singly-replicated blocks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to