[
https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hari Mankude updated HDFS-2753:
-------------------------------
Summary: Standby namenode stuck in safemode during a failover (was:
Standby namenode stuck in safenode during a failover)
> Standby namenode stuck in safemode during a failover
> ----------------------------------------------------
>
> Key: HDFS-2753
> URL: https://issues.apache.org/jira/browse/HDFS-2753
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Hari Mankude
> Assignee: Hari Mankude
>
> teragen is run to generate write traffic. A manual failover is initiated by
> killing namenode process. namenode which was killed is then restarted and it
> comes up as a standby. However, the standby never exits out of the safemode.
> Looking at the standby logs, it looks like the standby namenode gets
> addStoredBlock immediately after data node registration which results in
> numBlocks getting incremented. There is an optimization in processReport()
> which causes the follow on complete block report to be ignored when NN is in
> safemode. So, NN never exits out of the safemode.
> 2012-01-05 18:57:46,030 INFO hdfs.StateChange
> (DatanodeManager.java:registerDatanode(573)) - BLOCK*
> NameSystem.registerDatanode: node registration from 98.137.233.235:50010
> storage DS-526656430-98.137.233.235-50010-1325723536492
> 2012-01-05 18:57:46,033 INFO net.NetworkTopology
> (NetworkTopology.java:add(344)) - Adding a new node:
> /default-rack/98.137.233.235:50010
> 2012-01-05 18:57:46,033 INFO namenode.FSNamesystem
> (FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1
> 2012-01-05 18:57:46,034 INFO hdfs.StateChange
> (DatanodeManager.java:registerDatanode(573)) - BLOCK*
> NameSystem.registerDatanode: node registration from 98.137.233.237:50010
> storage DS-1961520590-98.137.233.237-50010-1325725253057
> 2012-01-05 18:57:46,034 INFO net.NetworkTopology
> (NetworkTopology.java:add(344)) - Adding a new node:
> /default-rack/98.137.233.237:50010
> 2012-01-05 18:57:46,042 INFO namenode.FSNamesystem
> (FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1
> 2012-01-05 18:57:46,045 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap
> updated: 98.137.233.235:50010 is added to
> blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0
> 2012-01-05 18:57:46,046 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap
> updated: 98.137.233.235:50010 is added to
> blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0
> 2012-01-05 18:57:46,046 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap
> updated: 98.137.233.237:50010 is added to
> blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED],
> ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0
> 2012-01-05 18:57:46,046 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap
> updated: 98.137.233.237:50010 is added to
> blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED],
> ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0
> 2012-01-05 18:57:46,049 INFO hdfs.StateChange
> (BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded
> non-initial block report from 98.137.233.235:50010 because namenode still in
> startup phase
> 2012-01-05 18:57:46,049 INFO hdfs.StateChange
> (BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded
> non-initial block report from 98.137.233.237:50010 because namenode still in
> startup phase
> 2012-01-05 18:58:05,167 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175,
> Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:05,168 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175,
> Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:06,634 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176,
> Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:06,636 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176,
> Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:08,097 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177,
> Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:08,097 INFO namenode.NameNode
> (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177,
> Queuing blockReceivedAndDeleted message
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira