[ https://issues.apache.org/jira/browse/HDFS-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060016#comment-16060016 ]
Kihwal Lee commented on HDFS-12008: ----------------------------------- When you set the conf to balance the space all the time (1.0f), the underutilized nodes should be picked about 75% of times. In 25% of times, the ones with less free space are picked twice (p=0.5*0.5), so end result is picking a node with less free space. In branch-2.8, no matter what you set the probability to, the result comes out 50%. I suspect this is because the specified scope when {{chooseDataNode()}} is called. The test setup happens to make it so that a rack is full of either 100% free nodes or 50% free nodes, never mixed. So, when two nodes are picked with a given rack scope, it can only pick one kind, both 100% free or both 50% free. So the random factor doesn't really matter and the chance of picking underutilized nodes (100% free) becomes exactly 50%, the percentage of such nodes in the cluster. If you change the number of racks to an odd number or change the way a rack is assigned to each node, the chance of picking underutilized nodes rises to over 70%, closer to the theoretical 75%. So the test was wrong and it was also checking against a wrong result. Now, on trunk, the behavior is different. I haven't looked in detail, but it indicates the scope is specified differently for {{chooseDataNode()}} compared to branch-2.8. I can see two nodes from different racks are getting picked from within a {{chooseDataNode()}} call. If the scope is same as before, it must be {{DFSNetworkTopology}} not honoring the scope. Either way, the behavior is different from branch-2.8. > Improve the available-space block placement policy > -------------------------------------------------- > > Key: HDFS-12008 > URL: https://issues.apache.org/jira/browse/HDFS-12008 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 2.8.1 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Attachments: HDFS-12008.patch > > > AvailableSpaceBlockPlacementPolicy currently picks two nodes unconditionally, > then picks one node. It could avoid picking the second node when not > necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org