[
https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900148#comment-14900148
]
He Tianyi commented on HDFS-9090:
---------------------------------
The combination theory makes sense. Thanks guys.
Since multiple placement policy is not supported yet, I took the approach that
DFSClient add nodes in local rack to {{excludeNodes}} during calls of
{{getAdditionalBlock}}. This is ugly but solved the problem right now.
I'll further wait for either HDFS-4894 or HDFS-7068 implemented, then use the
custom policy without write locality only for these data.
> Write hot data on few nodes may cause performance issue
> -------------------------------------------------------
>
> Key: HDFS-9090
> URL: https://issues.apache.org/jira/browse/HDFS-9090
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.3.0
> Reporter: He Tianyi
> Assignee: He Tianyi
>
> (I am not sure whether this should be reported as BUG, feel free to modify
> this)
> Current block placement policy makes best effort to guarantee first replica
> on local node whenever possible.
> Consider the following scenario:
> 1. There are 500 datanodes across plenty of racks,
> 2. Raw user action log (just an example) are being written only on 10 nodes,
> which also have datanode deployed locally,
> 3. Then, before any balance, all these logs will have at least one replica in
> 10 nodes, implying one thirds data read on these log will be served by these
> 10 nodes if repl factor is 3, performance suffers.
> I propose to solve this scenario by introducing a configuration entry for
> client to disable arbitrary level of write locality.
> Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode
> the locality we prefer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)