Hari Sekhon created HDFS-13739:
----------------------------------
Summary: Option to disable Rack Local Write Preference to avoid
Major Storage Imbalance across DataNodes caused by uneven spread of Datanodes
across Racks
Key: HDFS-13739
URL: https://issues.apache.org/jira/browse/HDFS-13739
Project: Hadoop HDFS
Issue Type: Improvement
Components: balancer & mover, block placement, datanode, fs,
hdfs, hdfs-client, namenode, nn, performance
Affects Versions: 2.7.3
Environment: Hortonworks HDP 2.6
Reporter: Hari Sekhon
Current HDFS write pattern of "local node, rack local node, other rack node" is
good for most purposes but when there is an uneven layout of datanodes across
racks it can cause major storage imbalance across nodes with some nodes filling
up and others being half empty.
I have observed this on a cluster where half the nodes were 85% full and the
other half were only 50% full.
Rack layouts like the following illustrate this - the nodes in the same rack
will only choose to send half their block replicas to each other, so they will
fill up first, while other nodes will receive far fewer replica blocks:
{code:java}
NumNodes - Rack
2 - rack 1
2 - rack 2
1 - rack 3
1 - rack 4
1 - rack 5
1 - rack 6{code}
In this case if I reduce the number of replicas to 2 then I get an almost
perfect spread of blocks across all datanodes because HDFS has no choice but to
maintain the only 2nd replica on a different rack. If I increase the replicas
back to 3 it goes back to 85% on half the nodes and 50% on the other half,
because the extra replicas choose to replicate only to rack local nodes.
Why not just run the HDFS balancer to fix it you might say? This is a heavily
loaded HBase cluster - aside from destroying HBase's data locality and
performance by moving blocks out from underneath RegionServers - as soon as an
HBase major compaction occurs (at least weekly), all blocks will get re-written
by HBase and the HDFS client will again write to local node, rack local node,
other rack node and resulting in the same storage imbalance again. Hence this
cannot be solved by running HDFS balancer on HBase clusters - or for any
application sitting on top of HDFS that has any HDFS block churn.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]