[
https://issues.apache.org/jira/browse/HADOOP-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597250#action_12597250
]
lohit vijayarenu commented on HADOOP-3392:
------------------------------------------
>Is it reasonable to ask for a hadoop command line option to salvage
>non-truncated blocks with checksum errors? Otherwise, one would have to copy
>the corrupted blocks to local filesystem (I overheard that this is possible in
>0.17, correct?) and put it back into dfs.
Could you expand what exactly salvage should do here? I am not sure if we would
be able to get a block using any command, unless you find its locations and go
to the datanode to get the actual block file stored.
> Corrupted blocks leading to job failures
> ----------------------------------------
>
> Key: HADOOP-3392
> URL: https://issues.apache.org/jira/browse/HADOOP-3392
> Project: Hadoop Core
> Issue Type: Improvement
> Affects Versions: 0.16.0
> Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks
> (checksum errors) such that jobs were failing because of no live blocks
> available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated
> blocks are okay.
> Even better, the namenode should automatically check under-replicated blocks
> with repeated replication failures for corruption and list them somewhere on
> the GUI. And for checksum errors, there should be an option to undo the
> corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have
> checksum errors? If not, then we could reduce the checking to
> singly-replicated blocks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.