caozhiqiang created HDFS-16456:
----------------------------------

             Summary: EC: Decommission a rack with only on dn will fail when 
the rack number is equal with replication
                 Key: HDFS-16456
                 URL: https://issues.apache.org/jira/browse/HDFS-16456
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ec, namenode
    Affects Versions: 3.1.1
            Reporter: caozhiqiang


In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason:
 # Enable EC policy, such as RS-6-3-1024k.
 # The rack number in this cluster is equal with the replication number(9)
 # A rack only has one DN, and decommission this DN.

The root cause is in 
BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will 
give a limit parameter maxNodesPerRack for choose targets. In this scenario, 
the maxNodesPerRack is 1, which means each rack can only be chosen one datanode.

When we decommission one dn which is only one node in its rack, the 
chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() 
will throw NotEnoughReplicasException, but the exception will not be caught and 
fail to fallback to chooseEvenlyFromRemainingRacks() function.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to