[ 
https://issues.apache.org/jira/browse/HADOOP-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597250#action_12597250
 ] 

lohit vijayarenu commented on HADOOP-3392:
------------------------------------------

>Is it reasonable to ask for a hadoop command line option to salvage 
>non-truncated blocks with checksum errors? Otherwise, one would have to copy 
>the corrupted blocks to local filesystem (I overheard that this is possible in 
>0.17, correct?) and put it back into dfs.

Could you expand what exactly salvage should do here? I am not sure if we would 
be able to get a block using any command, unless you find its locations and go 
to the datanode to get the actual block file stored.

> Corrupted blocks leading to job failures
> ----------------------------------------
>
>                 Key: HADOOP-3392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3392
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks 
> (checksum errors) such that jobs were failing because of no live blocks 
> available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated 
> blocks are okay.
> Even better, the namenode should automatically check under-replicated blocks 
> with repeated replication failures for corruption and list them somewhere on 
> the GUI. And for checksum errors, there should be an option to undo the 
> corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have 
> checksum errors? If not, then we could reduce the checking to 
> singly-replicated blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to