[ https://issues.apache.org/jira/browse/HADOOP-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494480 ]
Hadoop QA commented on HADOOP-1255: ----------------------------------- Integrated in Hadoop-Nightly #83 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/83/) > Name-node falls into infinite loop trying to remove a dead node. > ---------------------------------------------------------------- > > Key: HADOOP-1255 > URL: https://issues.apache.org/jira/browse/HADOOP-1255 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.12.3 > Reporter: Konstantin Shvachko > Assigned To: Hairong Kuang > Priority: Blocker > Fix For: 0.13.0 > > Attachments: heartbeat.patch, heartbeat.patch > > > Under certain conditions the name-node fall into infinite loop in > heartbeatCheck(). > It's rather hard to reproduce. I'm running one node cluster: 1 name-node, 1 > data-node. > The data-node dies, and 10 minutes later I get > 07/04/12 10:40:34 INFO net.NetworkTopology: Removing a node: > /default-rack/0.0.0.0:50077 > 07/04/12 10:44:35 INFO dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: > lost heartbeat from 0.0.0.0:50077 > ................................................... > 07/04/12 10:45:17 INFO net.NetworkTopology: Removing a node: > /default-rack/0.0.0.0:50077 > 07/04/12 10:47:44 INFO dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: > lost heartbeat from 0.0.0.0:50077 > Here is what I see in the debugger: > FSNamesystem.heartbeats contains 2 identical (same instance) > DatanodeDescriptor entries, both have > DatanodeDescriptor.isAlive = false. The heartbeatCheck() correctly detects > that there is a dead node in > the list, but removeDatanode() does not delete the node from the heartbeats > because it is dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.