[ 
https://issues.apache.org/jira/browse/HDFS-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195897#comment-13195897
 ] 

Eli Collins commented on HDFS-2791:
-----------------------------------

bq. Eli: >... overloading the notion of a corrupt block, eg it doesn't seem 
like receiving a RBR for a complete block implies that block is corrupt
bq. Eli this patch ignores a RBW with same generation stamp but marks RWR as 
corrupt. Do you mean RWR should not be marked as corrupt (i.e. a typo 
s^RBR^RWR^).

Sanjay: correct, made a typo, s/RBR/RWR/. I meant that we are considering RWR 
blocks corrupt, even though a complete block under recovery is not necessarily 
corrupt right?  Ie we're not marking RWR blocks corrupt here because we think 
they are actually corrupt, but because we want them to be deleted. This is a 
separate issue from this patch since it does not change the behavior for the 
RWR case, however I mentioned it here because this is an example of where 
considering blocks corrupt that are not actually corrupt bit us. Make sense?
                
> If block report races with closing of file, replica is incorrectly marked 
> corrupt
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-2791
>                 URL: https://issues.apache.org/jira/browse/HDFS-2791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.24.0, 0.23.1
>
>         Attachments: hdfs-2791-test.txt, hdfs-2791.txt, hdfs-2791.txt, 
> hdfs-2791.txt, hdfs-2791.txt
>
>
> The following sequence of events results in a replica mistakenly marked 
> corrupt:
> 1. Pipeline is open with 2 replicas
> 2. DN1 generates a block report but is slow in sending to the NN (eg some 
> flaky network). It gets "stuck" right before the block report RPC.
> 3. Client closes the file.
> 4. DN2 is fast and sends blockReceived to the NN. NN marks the block as 
> COMPLETE
> 5. DN1's block report proceeds, and includes the block in an RBW state.
> 6. (x) NN incorrectly marks the replica as corrupt, since it is an RBW 
> replica on a COMPLETE block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to