[
https://issues.apache.org/jira/browse/HDFS-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906031#action_12906031
]
Todd Lipcon commented on HDFS-1371:
-----------------------------------
I agree that "c" seems like an "incorrect feature". The DFSClient should have
to have
a configuration set to say "allow reading corrupt blocks" in my opinion.
Also I think Hairong's solution makes sense - the client should send
OP_STATUS_ERROR_CHECKSUM back to the DN, and the DN could then add it
to the front of the DatanodeBlockScanner queue.
It would be even better if the client reported the *offset* of the supposed
checksum error
so we could verify it immediately rather than scanning the full 64M block, but
that's
more of a protocol change.
> One bad node can incorrectly flag many files as corrupt
> -------------------------------------------------------
>
> Key: HDFS-1371
> URL: https://issues.apache.org/jira/browse/HDFS-1371
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client, name-node
> Affects Versions: 0.20.1
> Environment: yahoo internal version
> [knogu...@gwgd4003 ~]$ hadoop version
> Hadoop 0.20.104.3.1007030707
> Reporter: Koji Noguchi
>
> On our cluster, 12 files were reported as corrupt by fsck even though the
> replicas on the datanodes were healthy.
> Turns out that all the replicas (12 files x 3 replicas per file) were
> reported corrupt from one node.
> Surprisingly, these files were still readable/accessible from dfsclient
> (-get/-cat) without any problems.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.