[
https://issues.apache.org/jira/browse/HDFS-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194122#comment-13194122
]
Sanjay Radia commented on HDFS-2791:
------------------------------------
After reading HDFS-2742 I am coming to the conclusion that when a NN asks a DN
to delete a replica, in addition to the bid and generation stamp, it should
also include the state (RBW etc) known to the NN. The block is deleted only if
the it is in that state. I think it will catch some of the race conditions and
prevent a finalized replica being incorrectly deleted.
If folks think this is a good idea I will file a new Jira and make that change.
> If block report races with closing of file, replica is incorrectly marked
> corrupt
> ---------------------------------------------------------------------------------
>
> Key: HDFS-2791
> URL: https://issues.apache.org/jira/browse/HDFS-2791
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, name-node
> Affects Versions: 0.22.0, 0.23.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-2791-test.txt, hdfs-2791.txt, hdfs-2791.txt,
> hdfs-2791.txt, hdfs-2791.txt
>
>
> The following sequence of events results in a replica mistakenly marked
> corrupt:
> 1. Pipeline is open with 2 replicas
> 2. DN1 generates a block report but is slow in sending to the NN (eg some
> flaky network). It gets "stuck" right before the block report RPC.
> 3. Client closes the file.
> 4. DN2 is fast and sends blockReceived to the NN. NN marks the block as
> COMPLETE
> 5. DN1's block report proceeds, and includes the block in an RBW state.
> 6. (x) NN incorrectly marks the replica as corrupt, since it is an RBW
> replica on a COMPLETE block.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira