[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

Tsz Wo (Nicholas), SZE (JIRA) Tue, 27 Nov 2012 17:37:03 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505151#comment-13505151
 ]


Tsz Wo (Nicholas), SZE commented on HDFS-3875:
----------------------------------------------

> ... Nicholas, any comments on if this applies to old pipeline vs new pipeline?

Both the old and the new pipelines should have the similar problem since, when 
machine A sends some data to machine B and it fails, it is generally impossible 
to detect whether A, B or the network is faulty.  Of course, we can detect it 
for some special cases such as one of the machine dead.

> Potential blocker for 2.0.3-alpha.

I would say that this is not a blocker for 2.0.3-alpha since this is not a 
regression.
                
> Issue handling checksum errors in write pipeline
> ------------------------------------------------
>
>                 Key: HDFS-3875
>                 URL: https://issues.apache.org/jira/browse/HDFS-3875
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Kihwal Lee
>            Priority: Blocker
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

Reply via email to