[ 
https://issues.apache.org/jira/browse/HDFS-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HDFS-13739:
-------------------------------
    Summary: Option to disable Rack Local Write Preference to avoid 2 issues - 
Whole Rack Maintenance without risk of only 1 remaining replica, and avoid 
Major Storage Imbalance across DataNodes caused by uneven spread of Datanodes 
across Racks  (was: Option to disable Rack Local Write Preference to avoid 
Major Storage Imbalance across DataNodes caused by uneven spread of Datanodes 
across Racks)

> Option to disable Rack Local Write Preference to avoid 2 issues - Whole Rack 
> Maintenance without risk of only 1 remaining replica, and avoid Major Storage 
> Imbalance across DataNodes caused by uneven spread of Datanodes across Racks
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13739
>                 URL: https://issues.apache.org/jira/browse/HDFS-13739
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover, block placement, datanode, fs, 
> hdfs, hdfs-client, namenode, nn, performance
>    Affects Versions: 2.7.3
>         Environment: Hortonworks HDP 2.6
>            Reporter: Hari Sekhon
>            Priority: Major
>
> Current HDFS write pattern of "local node, rack local node, other rack node" 
> is good for most purposes but when there is an uneven layout of datanodes 
> across racks it can cause major storage imbalance across nodes with some 
> nodes filling up and others being half empty.
> I have observed this on a cluster where half the nodes were 85% full and the 
> other half were only 50% full.
> Rack layouts like the following illustrate this - the nodes in the same rack 
> will only choose to send half their block replicas to each other, so they 
> will fill up first, while other nodes will receive far fewer replica blocks:
> {code:java}
> NumNodes - Rack 
> 2 - rack 1
> 2 - rack 2
> 1 - rack 3
> 1 - rack 4 
> 1 - rack 5
> 1 - rack 6{code}
> In this case if I reduce the number of replicas to 2 then I get an almost 
> perfect spread of blocks across all datanodes because HDFS has no choice but 
> to maintain the only 2nd replica on a different rack. If I increase the 
> replicas back to 3 it goes back to 85% on half the nodes and 50% on the other 
> half, because the extra replicas choose to replicate only to rack local nodes.
>  Why not just run the HDFS balancer to fix it you might say? This is a 
> heavily loaded HBase cluster - aside from destroying HBase's data locality 
> and performance by moving blocks out from underneath RegionServers - as soon 
> as an HBase major compaction occurs (at least weekly), all blocks will get 
> re-written by HBase and the HDFS client will again write to local node, rack 
> local node, other rack node and resulting in the same storage imbalance 
> again. Hence this cannot be solved by running HDFS balancer on HBase clusters 
> - or for any application sitting on top of HDFS that has any HDFS block churn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to