[ 
https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877135#comment-14877135
 ] 

Steve Loughran commented on HDFS-9090:
--------------------------------------

I'm in favour of multiple placement policies, not options on the current one 
—with those policies designed to share as much code as they can (i.e. not the 
way the YARN schedulers currently work).

Allowing apps to chose a placement policy when creating an FS client instance 
would let you deploy different configs for different parts of the cluster; this 
would tie in well with label-driven placement

> Write hot data on few nodes may cause performance issue
> -------------------------------------------------------
>
>                 Key: HDFS-9090
>                 URL: https://issues.apache.org/jira/browse/HDFS-9090
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: He Tianyi
>            Assignee: He Tianyi
>
> (I am not sure whether this should be reported as BUG, feel free to modify 
> this)
> Current block placement policy makes best effort to guarantee first replica 
> on local node whenever possible.
> Consider the following scenario:
> 1. There are 500 datanodes across plenty of racks,
> 2. Raw user action log (just an example) are being written only on 10 nodes, 
> which also have datanode deployed locally,
> 3. Then, before any balance, all these logs will have at least one replica in 
> 10 nodes, implying one thirds data read on these log will be served by these 
> 10 nodes if repl factor is 3, performance suffers.
> I propose to solve this scenario by introducing a configuration entry for 
> client to disable arbitrary level of write locality.
> Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode 
> the locality we prefer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to