Ben, thats defined in ReplicationTargetChooser, first local, 2nd same rack, random. You're right - 50/50 if case one and two does not match.
- Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 6, 2012, at 11:56 AM, Ben Clay wrote: > Alex- > > Understood. We do not have a situation that extreme, I was just looking for > conceptual verification that reads are balanced across replicas of equal > distance. From the PDF you linked: > > "For reading, the name node first checks if the client's computer is located > in the cluster. If yes, block locations are returned to the client in the > order of its closeness to the reader. The block is read from data nodes in > this preference order." > > If two datanodes have equal closeness, I'd like to know how the NameNode > chooses between them. > > -Ben > > -----Original Message----- > From: alo.alt [mailto:wget.n...@googlemail.com] > Sent: Friday, January 06, 2012 12:45 PM > To: hdfs-user@hadoop.apache.org > Subject: Re: HDFS load balancing for non-local reads > > Ben, > > the scenario should not happen, if one DN has 20 clients and the other zero > (same block) the cluster (or DN) has another problem. Rack Awareness is > described here: > https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr > oposal.pdf > > - Alex > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > On Jan 5, 2012, at 6:49 PM, Ben Clay wrote: > >> Suresh- >> Thanks for the tips, I'll check those functions out, and examine plugging > in a different NetworkTopology. >> So to clarify, under the current scheme, if we have 1 block on two local > rack nodes A and B, it randomly chooses between those? IE, if DataNode A is > serving 20 clients and DataNode B is serving 1 client, they both have a 50% > chance of being selected for the 21st client? >> -Ben >> >> From: Suresh Srinivas [mailto:sur...@hortonworks.com] >> Sent: Thursday, January 05, 2012 5:33 PM >> To: hdfs-user@hadoop.apache.org >> Subject: Re: HDFS load balancing for non-local reads >> >> Currently it sorts the block locations as: >> # local node >> # local rack node >> # random order of remote nodes >> >> See DatanodeManager#sortLocatedBlock(...) and > NetworkTopology#pseudoSortByDistance(...). >> >> You can play around with other policies by plugging in different > NetworkTopology. >> >> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote: >> Hi- >> >> How does the NameNode handle load balancing of non-local reads with > multiple block locations when locality is equal? >> >> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the > same block, does the NameNode consider current client count or any other > load indicators when deciding which DataNode will satisfy the read request? > Or, is the client provided a list of all split locations and is allowed to > make this choice themselves? >> >> Thanks! >> >> -Ben >> > >