Ben,

thats defined in ReplicationTargetChooser, first local, 2nd same rack, random. 
You're right - 50/50 if case one and two does not match.

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 6, 2012, at 11:56 AM, Ben Clay wrote:

> Alex-
> 
> Understood. We do not have a situation that extreme, I was just looking for
> conceptual verification that reads are balanced across replicas of equal
> distance.  From the PDF you linked:
> 
> "For reading, the name node first checks if the client's computer is located
> in the cluster. If yes, block locations are returned to the client in the
> order of its closeness to the reader. The block is read from data nodes in
> this preference order."
> 
> If two datanodes have equal closeness, I'd like to know how the NameNode
> chooses between them.
> 
> -Ben
> 
> -----Original Message-----
> From: alo.alt [mailto:wget.n...@googlemail.com] 
> Sent: Friday, January 06, 2012 12:45 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: HDFS load balancing for non-local reads
> 
> Ben,
> 
> the scenario should not happen, if one DN has 20 clients and the other zero
> (same block) the cluster (or DN) has another problem. Rack Awareness is
> described here:
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
> oposal.pdf
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:
> 
>> Suresh-
>> Thanks for the tips, I'll check those functions out, and examine plugging
> in a different NetworkTopology.
>> So to clarify, under the current scheme, if we have 1 block on two local
> rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
> serving 20 clients and DataNode B is serving 1 client, they both have a 50%
> chance of being selected for the 21st client?
>> -Ben
>> 
>> From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
>> Sent: Thursday, January 05, 2012 5:33 PM
>> To: hdfs-user@hadoop.apache.org
>> Subject: Re: HDFS load balancing for non-local reads
>> 
>> Currently it sorts the block locations as:
>> # local node
>> # local rack node
>> # random order of remote nodes
>> 
>> See DatanodeManager#sortLocatedBlock(...) and
> NetworkTopology#pseudoSortByDistance(...).
>> 
>> You can play around with other policies by plugging in different
> NetworkTopology.
>> 
>> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote:
>> Hi-
>> 
>> How does the NameNode handle load balancing of non-local reads with
> multiple block locations when locality is equal?
>> 
>> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
> same block, does the NameNode consider current client count or any other
> load indicators when deciding which DataNode will satisfy the read request?
> Or, is the client provided a list of all split locations and is allowed to
> make this choice themselves?
>> 
>> Thanks!
>> 
>> -Ben
>> 
> 
> 

Reply via email to