[
https://issues.apache.org/jira/browse/HDFS-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16241538#comment-16241538
]
Chen Liang commented on HDFS-11419:
-----------------------------------
Hi [~cheersyang],
Thanks for sharing and sorry for the delay on response as I am on vacation.
So just to make sure we are on the same page: you have 2 SSD on each of the 500
DNs, for each block, due to ALL_SSD policy, the NN will always try to use the 2
SSDs on each of the DNs, but when most of the SSDs are full, NN will spend lot
of time on trying to find available SSDs among the 500 DNs. Is this what you
saw?
If this is the case, I'm afraid the sub-tasks here will not be enough. Because
this JIRA was to address a different scenario such as, there are only 10 nodes
with SSD among the 500 nodes, and NN is trying to locate these 10 nodes.
Namely, here it is to *locate nodes with certain storage types*. But in your
case it seems NN is doing a bit more by trying to *locate nodes with certain
storage type _and enough space_*.
I think this can done by just adding another sub-task to this JIRA that tracks
the available space of different storage types, then the rest should be
straitforward. Any other comments? [~linyiqun], [~arpitagarwal], [~szetszwo].
> BlockPlacementPolicyDefault is choosing datanode in an inefficient way
> ----------------------------------------------------------------------
>
> Key: HDFS-11419
> URL: https://issues.apache.org/jira/browse/HDFS-11419
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Chen Liang
> Assignee: Chen Liang
>
> Currently in {{BlockPlacementPolicyDefault}}, {{chooseTarget}} will end up
> calling into {{chooseRandom}}, which will first find a random datanode by
> calling
> {code}DatanodeDescriptor chosenNode = chooseDataNode(scope,
> excludedNodes);{code}, then it checks whether that returned datanode
> satisfies storage type requirement
> {code}storage = chooseStorage4Block(
> chosenNode, blocksize, results, entry.getKey());{code}
> If yes, {{numOfReplicas--;}}, otherwise, the node is added to excluded nodes,
> and runs the loop again until {{numOfReplicas}} is down to 0.
> A problem here is that, storage type is not being considered only until after
> a random node is already returned. We've seen a case where a cluster has a
> large number of datanodes, while only a few satisfy the storage type
> condition. So, for the most part, this code blindly picks random datanodes
> that do not satisfy the storage type requirement.
> To make matters worse, the way {{NetworkTopology#chooseRandom}} works is
> that, given a set of excluded nodes, it first finds a random datanodes, then
> if it is in excluded nodes set, try find another random nodes. So the more
> excluded nodes there are, the more likely a random node will be in the
> excluded set, in which case we basically wasted one iteration.
> Therefore, this JIRA proposes to augment/modify the relevant classes in a way
> that datanodes can be found more efficiently. There are currently two
> different high level solutions we are considering:
> 1. add some field to Node base types to describe the storage type info, and
> when searching for a node, we take into account such field(s), and do not
> return node that does not meet the storage type requirement.
> 2. change {{NetworkTopology}} class to be aware of storage types, e.g. for
> one storage type, there is one tree subset that connects all the nodes with
> that type. And one search happens on only one such subset. So unexpected
> storage types are simply not in the search space.
> Thanks [~szetszwo] for the offline discussion, and thanks [~linyiqun] for
> pointing out a wrong statement (corrected now) in the description. Any
> further comments are more than welcome.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]