[jira] [Commented] (HDFS-10967) Add configuration for BlockPlacementPolicy to avoid near-full DataNodes

Ming Ma (JIRA) Mon, 10 Oct 2016 11:48:51 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563120#comment-15563120
 ]


Ming Ma commented on HDFS-10967:
--------------------------------

Thanks [~zhz]. Indeed that is an issue when the cluster has heterogeneous 
nodes, and the rack-based assumption makes sense to me.

* For balancer or over replicated scenarios, {{chooseReplicasToDelete}} uses 
absolute free space, maybe that needs to be change to percentage based?
* What if we move this new policy to {{isGoodDatanode}}? It has several 
benefits:
** BlockPlacementPolicyRackFaultTolerant can uses it.
** Cover the case where the writer is outside of the cluster, thus the call 
path is chooseLocalRack -> chooseRandom.
* Typo below you meant this.considerCapacity.
{noformat}
    this.considerLoad = conf.getBoolean(...);
{noformat}



> Add configuration for BlockPlacementPolicy to avoid near-full DataNodes
> -----------------------------------------------------------------------
>
>                 Key: HDFS-10967
>                 URL: https://issues.apache.org/jira/browse/HDFS-10967
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>              Labels: balancer
>         Attachments: HDFS-10967.00.patch, HDFS-10967.01.patch
>
>
> Large production clusters are likely to have heterogeneous nodes in terms of 
> storage capacity, memory, and CPU cores. It is not always possible to 
> proportionally ingest data into DataNodes based on their remaining storage 
> capacity. Therefore it's possible for a subset of DataNodes to be much closer 
> to full capacity than the rest.
> This heterogeneity is most likely rack-by-rack -- i.e. _m_ whole racks of 
> low-storage nodes and _n_ whole racks of high-storage nodes. So It'd be very 
> useful if we can lower the chance for those near-full DataNodes to become 
> destinations for the 2nd and 3rd replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10967) Add configuration for BlockPlacementPolicy to avoid near-full DataNodes

Reply via email to