[ 
https://issues.apache.org/jira/browse/HDFS-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746738#comment-13746738
 ] 

Konstantin Shvachko commented on HDFS-5077:
-------------------------------------------

- Your patch seems to be implementing approach 1, while you claim it is 2.
- It doesn't look that you actually eliminate NPE, since not assign anything to 
descriptors[i] means it is going to be NULL.

I think we should treat commitBlockSynchronization() as a partial block report 
for each replica reported. If DN calls commitBlockSynchronization() it means it 
has seen and evaluated all replicas of the block on all DNs reported. So their 
state was consistent at the time of inspection. Now if NN doesn't see a DN 
active it should remove replica from the block, as in block report. This will 
make the block under-replicated and the BlockManager will further take care of 
its replication. If all reported replicas are missing it is very unfortunate as 
we will have a missing block. But forcing another recovery will not help in 
this case because there are no replicas to recover.
                
> NPE in FSNamesystem.commitBlockSynchronization()
> ------------------------------------------------
>
>                 Key: HDFS-5077
>                 URL: https://issues.apache.org/jira/browse/HDFS-5077
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.5-alpha
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-5077.patch
>
>
> NN starts a block recovery, which will synchronize block replicas on 
> different DNs. In the end one of DNs will report the list of the nodes 
> containing the consistent replicas to the NN via commitBlockSynchronization() 
> call. The NPE happens if just before processing commitBlockSynchronization() 
> NN removes from active one of DNs that are then reported in the call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to