[ https://issues.apache.org/jira/browse/HDFS-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE resolved HDFS-387. ----------------------------------------- Resolution: Not A Problem Closing. Please feel free to reopen this if this is still a problem. > Corrupted blocks leading to job failures > ---------------------------------------- > > Key: HDFS-387 > URL: https://issues.apache.org/jira/browse/HDFS-387 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Christian Kunz > > On one of our clusters we ended up with 11 singly-replicated corrupted blocks > (checksum errors) such that jobs were failing because of no live blocks > available. > fsck reports the system as healthy, although it is not. > I argue that fsck should have an option to check whether under-replicated > blocks are okay. > Even better, the namenode should automatically check under-replicated blocks > with repeated replication failures for corruption and list them somewhere on > the GUI. And for checksum errors, there should be an option to undo the > corruption and recompute the checksums. > Question: Is it at all probable that two or more replications of a block have > checksum errors? If not, then we could reduce the checking to > singly-replicated blocks. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira