He Tianyi created HDFS-9090:
-------------------------------

             Summary: Write hot data on few nodes may cause performance issue
                 Key: HDFS-9090
                 URL: https://issues.apache.org/jira/browse/HDFS-9090
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.3.0
            Reporter: He Tianyi
            Assignee: He Tianyi


(I am not sure whether this should be reported as BUG, feel free to modify this)

Current block placement policy makes best effort to guarantee first replica on 
local node whenever possible.

Consider the following scenario:
1. There are 500 datanodes across plenty of racks,
2. Raw user action log (just an example) are being written only on 10 nodes, 
which also have datanode deployed locally,
3. Then, before any balance, all these logs will have at least one replica in 
10 nodes, implying one third data read will be served by these 10 nodes if repl 
factor is 3, performance suffer.

I propose to solve this scenario by introducing a configuration entry for 
client to disable arbitrary level of write locality.
Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode 
the locality we prefer.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to