[
https://issues.apache.org/jira/browse/HDFS-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547572#comment-16547572
]
Hari Sekhon commented on HDFS-13739:
------------------------------------
HDFS-7541 helps with the first issue although not the second and isn't as
simple a solution as disabling rack local write and enforcing rack remote
writes for 2nd and 3rd replicas.
> Option to disable Rack Local Write Preference to avoid 2 issues - 1.
> Rack-by-Rack Maintenance leaves last data replica at risk, 2. avoid Major
> Storage Imbalance across DataNodes caused by uneven spread of Datanodes
> across Racks
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-13739
> URL: https://issues.apache.org/jira/browse/HDFS-13739
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer & mover, block placement, datanode, fs,
> hdfs, hdfs-client, namenode, nn, performance
> Affects Versions: 2.7.3
> Environment: Hortonworks HDP 2.6
> Reporter: Hari Sekhon
> Priority: Major
>
> Request to be able to disable Rack Local Write preference / Write All
> Replicas to different Racks.
> Current HDFS write pattern of "local node, rack local node, other rack node"
> is good for most purposes but there are at least 2 scenarios where this is
> not ideal:
> # Rack-by-Rack Maintenance leaves data at risk of losing last remaining
> replica. If a single datanode failed it would likely cause some data outage
> or even data loss if the rack is lost or an upgrade fails (or perhaps it's a
> rack rebuild). Setting replicas to 4 would reduce write performance and waste
> storage which is currently the only workaround to that issue.
> # Major Storage Imbalance across datanodes when there is an uneven layout of
> datanodes across racks - some nodes fill up while others are half empty.
> I have observed this storage imbalance on a cluster where half the nodes were
> 85% full and the other half were only 50% full.
> Rack layouts like the following illustrate this - the nodes in the same rack
> will only choose to send half their block replicas to each other, so they
> will fill up first, while other nodes will receive far fewer replica blocks:
> {code:java}
> NumNodes - Rack
> 2 - rack 1
> 2 - rack 2
> 1 - rack 3
> 1 - rack 4
> 1 - rack 5
> 1 - rack 6{code}
> In this case if I reduce the number of replicas to 2 then I get an almost
> perfect spread of blocks across all datanodes because HDFS has no choice but
> to maintain the only 2nd replica on a different rack. If I increase the
> replicas back to 3 it goes back to 85% on half the nodes and 50% on the other
> half, because the extra replicas choose to replicate only to rack local nodes.
> Why not just run the HDFS balancer to fix it you might say? This is a heavily
> loaded HBase cluster - aside from destroying HBase's data locality and
> performance by moving blocks out from underneath RegionServers - as soon as
> an HBase major compaction occurs (at least weekly), all blocks will get
> re-written by HBase and the HDFS client will again write to local node, rack
> local node, other rack node - resulting in the same storage imbalance again.
> Hence this cannot be solved by running HDFS balancer on HBase clusters - or
> for any application sitting on top of HDFS that has any HDFS block churn.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]