[ https://issues.apache.org/jira/browse/HDFS-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039675#comment-17039675 ]
Hudson commented on HDFS-13739: ------------------------------- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17964 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17964/]) HDFS-13739. Add option to disable rack local write preference. (ayushsaxena: rev ac4b556e2d44d3cd10b81c190ecee23e2dd66c10) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/AddBlockFlag.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CreateFlag.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java > Add option to disable rack local write preference > ------------------------------------------------- > > Key: HDFS-13739 > URL: https://issues.apache.org/jira/browse/HDFS-13739 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, block placement, datanode, fs, > hdfs, hdfs-client, namenode, nn, performance > Affects Versions: 2.7.3 > Environment: Hortonworks HDP 2.6 > Reporter: Hari Sekhon > Assignee: Ayush Saxena > Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13739-01.patch > > > Request to be able to disable Rack Local Write preference / Write All > Replicas to different Racks. > Current HDFS write pattern of "local node, rack local node, other rack node" > is good for most purposes but there are at least 2 scenarios where this is > not ideal: > # Rack-by-Rack Maintenance leaves data at risk of losing last remaining > replica. If a single datanode failed it would likely cause some data outage > or even data loss if the rack is lost or an upgrade fails (or perhaps it's a > rack rebuild). Setting replicas to 4 would reduce write performance and waste > storage which is currently the only workaround to that issue. > # Major Storage Imbalance across datanodes when there is an uneven layout of > datanodes across racks - some nodes fill up while others are half empty. > I have observed this storage imbalance on a cluster where half the nodes were > 85% full and the other half were only 50% full. > Rack layouts like the following illustrate this - the nodes in the same rack > will only choose to send half their block replicas to each other, so they > will fill up first, while other nodes will receive far fewer replica blocks: > {code:java} > NumNodes - Rack > 2 - rack 1 > 2 - rack 2 > 1 - rack 3 > 1 - rack 4 > 1 - rack 5 > 1 - rack 6{code} > In this case if I reduce the number of replicas to 2 then I get an almost > perfect spread of blocks across all datanodes because HDFS has no choice but > to maintain the only 2nd replica on a different rack. If I increase the > replicas back to 3 it goes back to 85% on half the nodes and 50% on the other > half, because the extra replicas choose to replicate only to rack local nodes. > Why not just run the HDFS balancer to fix it you might say? This is a heavily > loaded HBase cluster - aside from destroying HBase's data locality and > performance by moving blocks out from underneath RegionServers - as soon as > an HBase major compaction occurs (at least weekly), all blocks will get > re-written by HBase and the HDFS client will again write to local node, rack > local node, other rack node - resulting in the same storage imbalance again. > Hence this cannot be solved by running HDFS balancer on HBase clusters - or > for any application sitting on top of HDFS that has any HDFS block churn. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org