[
https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584465#comment-14584465
]
Brahma Reddy Battula commented on HDFS-8586:
--------------------------------------------
*Test Code to reproduce this bug*
{code}
public void testDeadDatanodeForBlockLocation() throws Exception {
Configuration conf = new HdfsConfiguration();
conf.setInt(DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 500);
conf.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L);
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(3).build();
cluster.waitActive();
String poolId = cluster.getNamesystem().getBlockPoolId();
// wait for datanode to be marked live
DataNode dn = cluster.getDataNodes().get(0);
DatanodeRegistration reg =
DataNodeTestUtils.getDNRegistrationForBP(dn, poolId);
DFSTestUtil.waitForDatanodeState(cluster, reg.getDatanodeUuid(), true,
20000);
// Shutdown and wait for data node to be marked dead
dn.shutdown();
DFSTestUtil.waitForDatanodeState(cluster, reg.getDatanodeUuid(), false,
20000);
System.out.println("Dn downXXX: " + dn.getDisplayName());
Path file = new Path("afile");
try (FSDataOutputStream outputStream = cluster.getFileSystem().create(file))
{
outputStream.writeChars("testContent");
}
BlockLocation block = cluster.getFileSystem().getFileBlockLocations(file,
0, 10)[0];
System.out.println("Dn down: " + dn.getDisplayName());
for(String node : block.getNames())
{
System.out.println(node);
if(node.equals(dn.getDisplayName()))
{
fail("Not expecting the block in a dead node");
}
}
}
{code}
*Impact which I seen*
{color:red}The cluster have 9 Datanode,now stop 5. dfs.replications=3. Put
files to HDFS continuously, but some operations failed.{color}
I think, Here we can deadnodes also...
{code}
if (isGoodTarget(storage, blockSize, maxNodesPerRack, considerLoad,
results, avoidStaleNodes, storageType)) {
results.add(storage);
// add node and related nodes to excludedNode
return addToExcludedNodes(storage.getDatanodeDescriptor(), excludedNodes);
}
{code}
> Dead Datanode is allocated for write when client is from deadnode
> ------------------------------------------------------------------
>
> Key: HDFS-8586
> URL: https://issues.apache.org/jira/browse/HDFS-8586
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
>
> *{color:blue}DataNode marked as Dead{color}*
> 2015-06-11 19:39:00,862 | INFO |
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
> | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* |
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
> 2015-06-11 19:39:00,863 | INFO |
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
> | Removing a node: /default/rack3/XX.XX.39.33:25009 |
> org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
> *{color:blue}Deadnode got Allocated{color}*
> 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The
> cluster does not contain node: /default/rack3/XX.XX.39.33:25009 |
> org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The
> cluster does not contain node: /default/rack3/XX.XX.39.33:25009 |
> org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The
> cluster does not contain node: /default/rack3/XX.XX.39.33:25009 |
> org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The
> cluster does not contain node: /default/rack3/XX.XX.39.33:25009 |
> org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK*
> *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION,
> truncateBlock=null, primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
>
> ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW],
> ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL:
> *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ |
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
> 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK*
> allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
>
> ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW],
> ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85:
> {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ |
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)