[jira] Commented: (HDFS-1371) One bad node can incorrectly flag many files as corrupt

Todd Lipcon (JIRA) Fri, 03 Sep 2010 12:23:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906031#action_12906031
 ]


Todd Lipcon commented on HDFS-1371:
-----------------------------------

I agree that "c" seems like an "incorrect feature". The DFSClient should have 
to have
a configuration set to say "allow reading corrupt blocks" in my opinion.

Also I think Hairong's solution makes sense - the client should send
OP_STATUS_ERROR_CHECKSUM back to the DN, and the DN could then add it
to the front of the DatanodeBlockScanner queue.

It would be even better if the client reported the *offset* of the supposed 
checksum error
so we could verify it immediately rather than scanning the full 64M block, but 
that's
more of a protocol change.

> One bad node can incorrectly flag many files as corrupt
> -------------------------------------------------------
>
>                 Key: HDFS-1371
>                 URL: https://issues.apache.org/jira/browse/HDFS-1371
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20.1
>         Environment: yahoo internal version 
> [knogu...@gwgd4003 ~]$ hadoop version
> Hadoop 0.20.104.3.1007030707
>            Reporter: Koji Noguchi
>
> On our cluster, 12 files were reported as corrupt by fsck even though the 
> replicas on the datanodes were healthy.
> Turns out that all the replicas (12 files x 3 replicas per file) were 
> reported corrupt from one node.
> Surprisingly, these files were still readable/accessible from dfsclient 
> (-get/-cat) without any problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1371) One bad node can incorrectly flag many files as corrupt

Reply via email to