[ 
https://issues.apache.org/jira/browse/HDFS-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978371#comment-13978371
 ] 

Rushabh S Shah commented on HDFS-5773:
--------------------------------------

I wrote a test case.
Steps to reproduce test case:
1. Create a MiniDFSCluster with 1 namenode and 3 datanode
2. Make the heartbeat interval (DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 7) 
too high.
3.  Make the heartbeat recheck interval 
(DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY)  low.
4. Open a file.
5. Sleep for an appropriate amount of time such that the namenode declares the 
node dead since the datanode didn't heartbeated within the  heartbeat recheck 
interval and datanode sent the block report.
6. This will generate an IOException with the following stack trace
    java.io.IOException: Got blockReceivedDeleted message from unregistered or 
dead node
    at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blockReceivedAndDeleted(BlockManager.java:2238)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:825)
7. But when the datanode heartbeated to namenode (after the heartbeat 
interval), the namenode re registered the data node and added it to the 
topology and the namenode recovered from the Exception.

So according to my test case, the namenode recovered as it should.
I was not able to reproduce the error that was mentioned in this jira

So closing the jira and feel free to reopen if it happened again.



> NN may reject formerly dead DNs
> -------------------------------
>
>                 Key: HDFS-5773
>                 URL: https://issues.apache.org/jira/browse/HDFS-5773
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.10
>            Reporter: Daryn Sharp
>            Assignee: Rushabh S Shah
>            Priority: Critical
>
> If the heartbeat monitor declares a node dead, it may never allow a DN to 
> rejoin.  The NN will generate messages like "Got blockReceivedDeleted message 
> from unregistered or dead node".
> There appears to be a bug where the the isAlive flag is not set to true when 
> a formerly known DN attempts to rejoin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to