Konstantin Shvachko commented on HDFS-10967:

Sounds like HDFS-8131 already takes care of this issue.
# {{AvailableSpaceBlockPlacementPolicy}} chooses first replica locally
# It can be configured to prefer emptier nodes for other than the first replica 
(2nd, 3rd, etc.).
# If space usage on two nodes differs less than 5%, 
{{AvailableSpaceBlockPlacementPolicy}} treats them as equal size. 5% is 
One generalization could be to factor the node size into the probability in the 
way that near-full nodes had very low chance of getting new replicas.
But this is probably a different issue, as no configuration change is needed.

> Add configuration for BlockPlacementPolicy to avoid near-full DataNodes
> -----------------------------------------------------------------------
>                 Key: HDFS-10967
>                 URL: https://issues.apache.org/jira/browse/HDFS-10967
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>              Labels: balancer
>         Attachments: HDFS-10967.00.patch, HDFS-10967.01.patch, 
> HDFS-10967.02.patch, HDFS-10967.03.patch
> Large production clusters are likely to have heterogeneous nodes in terms of 
> storage capacity, memory, and CPU cores. It is not always possible to 
> proportionally ingest data into DataNodes based on their remaining storage 
> capacity. Therefore it's possible for a subset of DataNodes to be much closer 
> to full capacity than the rest.
> This heterogeneity is most likely rack-by-rack -- i.e. _m_ whole racks of 
> low-storage nodes and _n_ whole racks of high-storage nodes. So It'd be very 
> useful if we can lower the chance for those near-full DataNodes to become 
> destinations for the 2nd and 3rd replicas.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to