Ghost nodes in excluded node list for block allocation limit replication target
count
-------------------------------------------------------------------------------------
Key: HDFS-1168
URL: https://issues.apache.org/jira/browse/HDFS-1168
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client, name-node
Reporter: Todd Lipcon
In HDFS-630 we added an excludedNodes parameter when allocating a block. In the
case of a cluster that uses transient IPC ports, this list can accumulate past
incarnations of restarted datanodes. Then, in
NetworkTopology.countNumOfAvailableNodes, we count each of these "ghost" nodes
against the total number of available nodes, and decide that there are no spots
to place replicas, even though plenty are alive.
To reproduce, write data into HDFS with a very small block size (say 4KB) and
then repeatedly kill and restart the local DN configured to use a transient
port. After you have done so N times, where N is the number of nodes in the
cluster, the NN will fail to allocate any targets even though N other nodes are
still alive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.