[
https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Luo updated HDFS-5837:
--------------------------
Status: Patch Available (was: Open)
> dfs.namenode.replication.considerLoad does not consider decommissioned nodes
> ----------------------------------------------------------------------------
>
> Key: HDFS-5837
> URL: https://issues.apache.org/jira/browse/HDFS-5837
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.2.0, 2.0.6-alpha, 2.0.0-alpha
> Reporter: Bryan Beaudreault
> Assignee: Tao Luo
> Attachments: HDFS-5837.patch
>
>
> In DefaultBlockPlacementPolicy, there is a setting
> dfs.namenode.replication.considerLoad which tries to balance the load of the
> cluster when choosing replica locations. This code does not take into
> account decommissioned nodes.
> The code for considerLoad calculates the load by doing: TotalClusterLoad /
> numNodes. However, numNodes includes decommissioned nodes (which have 0
> load). Therefore, the average load is artificially low. Example:
> TotalLoad = 250
> numNodes = 100
> decommissionedNodes = 70
> remainingNodes = numNodes - decommissionedNodes = 30
> avgLoad = 250/100 = 2.50
> trueAvgLoad = 250 / 30 = 8.33
> If the real load of the remaining 30 nodes is (on average) 8.33, this is more
> than 2x the calculated average load of 2.50. This causes these nodes to be
> rejected as replica locations. The final result is that all nodes are
> rejected, and no replicas can be placed.
> See exceptions printed from client during this scenario:
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)