[
https://issues.apache.org/jira/browse/HDFS-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz Wo (Nicholas), SZE resolved HDFS-387.
-----------------------------------------
Resolution: Not A Problem
Closing. Please feel free to reopen this if this is still a problem.
> Corrupted blocks leading to job failures
> ----------------------------------------
>
> Key: HDFS-387
> URL: https://issues.apache.org/jira/browse/HDFS-387
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks
> (checksum errors) such that jobs were failing because of no live blocks
> available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated
> blocks are okay.
> Even better, the namenode should automatically check under-replicated blocks
> with repeated replication failures for corruption and list them somewhere on
> the GUI. And for checksum errors, there should be an option to undo the
> corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have
> checksum errors? If not, then we could reduce the checking to
> singly-replicated blocks.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira