Re: HDFS load balancing for non-local reads

alo.alt Fri, 06 Jan 2012 09:45:10 -0800

Ben,

the scenario should not happen, if one DN has 20 clients and the other zero 
(same block) the cluster (or DN) has another problem. Rack Awareness is 
described here:
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf


- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

> Suresh-
> Thanks for the tips, I’ll check those functions out, and examine plugging in 
> a different NetworkTopology.
> So to clarify, under the current scheme, if we have 1 block on two local rack 
> nodes A and B, it randomly chooses between those? IE, if DataNode A is 
> serving 20 clients and DataNode B is serving 1 client, they both have a 50% 
> chance of being selected for the 21st client?
> -Ben
>  
> From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
> Sent: Thursday, January 05, 2012 5:33 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: HDFS load balancing for non-local reads
>  
> Currently it sorts the block locations as:
> # local node
> # local rack node
> # random order of remote nodes
> 
> See DatanodeManager#sortLocatedBlock(...) and 
> NetworkTopology#pseudoSortByDistance(...).
> 
> You can play around with other policies by plugging in different 
> NetworkTopology.
> 
> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote:
> Hi-
>  
> How does the NameNode handle load balancing of non-local reads with multiple 
> block locations when locality is equal?
>  
> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the 
> same block, does the NameNode consider current client count or any other load 
> indicators when deciding which DataNode will satisfy the read request?  Or, 
> is the client provided a list of all split locations and is allowed to make 
> this choice themselves?
>  
> Thanks!
>  
> -Ben
>

Re: HDFS load balancing for non-local reads

Reply via email to