Vinay created HDFS-5112:
---------------------------
Summary: NetWorkTopology#countNumOfAvailableNodes() is returning
wrong value if excluded nodes passed are not part of the cluster tree
Key: HDFS-5112
URL: https://issues.apache.org/jira/browse/HDFS-5112
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.0.5-alpha, 3.0.0
Reporter: Vinay
Assignee: Vinay
I got "File /hdfs_COPYING_ could only be replicated to 0 nodes instead of
minReplication (=1). There are 1 datanode(s) running and 1 node(s) are
excluded in this operation." in the following case
1. 2 DNs cluster,
2. One of the datanodes was not responding from last 10 min, but about to
detect as dead at NN.
3. Tried to write one file, for the block NN allocated both DNs.
4. Client While creating the pipeline took some time to detect one node failure.
5. Before client detects pipeline failure, NN side dead node was removed from
cluster map.
6. Now, client has abandoned previous block and asked for new block with dead
node in excluded list and got above exception even though one more node was
available live.
When I dig this more, found that,
{{NetWorkTopology#countNumOfAvailableNodes()}} is not giving correct count when
the excludeNodes passed from client are not part of the cluster map.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira