Ben,

the scenario should not happen, if one DN has 20 clients and the other zero 
(same block) the cluster (or DN) has another problem. Rack Awareness is 
described here:
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

> Suresh-
> Thanks for the tips, I’ll check those functions out, and examine plugging in 
> a different NetworkTopology.
> So to clarify, under the current scheme, if we have 1 block on two local rack 
> nodes A and B, it randomly chooses between those? IE, if DataNode A is 
> serving 20 clients and DataNode B is serving 1 client, they both have a 50% 
> chance of being selected for the 21st client?
> -Ben
>  
> From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
> Sent: Thursday, January 05, 2012 5:33 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: HDFS load balancing for non-local reads
>  
> Currently it sorts the block locations as:
> # local node
> # local rack node
> # random order of remote nodes
> 
> See DatanodeManager#sortLocatedBlock(...) and 
> NetworkTopology#pseudoSortByDistance(...).
> 
> You can play around with other policies by plugging in different 
> NetworkTopology.
> 
> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote:
> Hi-
>  
> How does the NameNode handle load balancing of non-local reads with multiple 
> block locations when locality is equal?
>  
> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the 
> same block, does the NameNode consider current client count or any other load 
> indicators when deciding which DataNode will satisfy the read request?  Or, 
> is the client provided a list of all split locations and is allowed to make 
> this choice themselves?
>  
> Thanks!
>  
> -Ben
>  

Reply via email to