[
https://issues.apache.org/jira/browse/HADOOP-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Chansler updated HADOOP-3392:
------------------------------------
Component/s: dfs
> Corrupted blocks leading to job failures
> ----------------------------------------
>
> Key: HADOOP-3392
> URL: https://issues.apache.org/jira/browse/HADOOP-3392
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks
> (checksum errors) such that jobs were failing because of no live blocks
> available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated
> blocks are okay.
> Even better, the namenode should automatically check under-replicated blocks
> with repeated replication failures for corruption and list them somewhere on
> the GUI. And for checksum errors, there should be an option to undo the
> corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have
> checksum errors? If not, then we could reduce the checking to
> singly-replicated blocks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.