Chen Liang created HDFS-11507:
---------------------------------
Summary: NetworkTopology#chooseRandom may run into a dead loop due
to race condition
Key: HDFS-11507
URL: https://issues.apache.org/jira/browse/HDFS-11507
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang
{{NetworkTopology#chooseRandom()}} works as:
1. counts the number of available nodes as {{availableNodes}},
2. checks how many nodes are excluded, deduct from {{availableNodes}}
3. if {{availableNodes}} still > 0, then there are nodes available.
4. keep looping to find that node
But now imagine, in the meantime, the actually available nodes got removed in
step 3 or step 4, and all remaining nodes are excluded nodes. Then, although
there are no more nodes actually available, the code would still run as
{{availableNodes}} > 0, and then it would keep getting excluded node and loop
forever, as
{{if (excludedNodes == null || !excludedNodes.contains(ret))}}
will always be false.
We may fix this by expanding the while loop to also include the
{{availableNodes}} calculation. Such that we re-calculate {{availableNodes}}
every time it fails to find an available node.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]