[ 
https://issues.apache.org/jira/browse/HDFS-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060016#comment-16060016
 ] 

Kihwal Lee commented on HDFS-12008:
-----------------------------------

When you set the conf to balance the space all the time (1.0f), the 
underutilized nodes should be picked about 75% of times. In 25% of times, the 
ones with less free space are picked twice (p=0.5*0.5), so end result is 
picking a node with less free space.  In branch-2.8, no matter what you set the 
probability to, the result comes out 50%.

I suspect this is because the specified scope  when {{chooseDataNode()}} is 
called. The test setup happens to make it so that a rack is full of either 100% 
free nodes or 50% free nodes, never mixed.  So, when two nodes are picked with 
a given rack scope, it can only pick one kind, both 100% free or both 50% free. 
 So the random factor doesn't really matter and the chance of picking 
underutilized nodes (100% free) becomes exactly 50%, the percentage of such 
nodes in the cluster.

If you change the number of racks to an odd number or change the way a rack is 
assigned to each node, the chance of picking underutilized nodes rises to over 
70%, closer to the theoretical 75%.  So the test was wrong and it was also 
checking against a wrong result.

Now, on trunk, the behavior is different. I haven't looked in detail, but it 
indicates the scope is specified differently for {{chooseDataNode()}} compared 
to branch-2.8.  I can see two nodes from different racks are getting picked 
from within a {{chooseDataNode()}} call.  If the scope is same as before, it 
must be {{DFSNetworkTopology}} not honoring the scope.  Either way, the 
behavior is different from branch-2.8.


> Improve the available-space block placement policy
> --------------------------------------------------
>
>                 Key: HDFS-12008
>                 URL: https://issues.apache.org/jira/browse/HDFS-12008
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 2.8.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-12008.patch
>
>
> AvailableSpaceBlockPlacementPolicy currently picks two nodes unconditionally, 
> then picks one node. It could avoid picking the second node when not 
> necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to