Ben, the scenario should not happen, if one DN has 20 clients and the other zero (same block) the cluster (or DN) has another problem. Rack Awareness is described here: https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
- Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 5, 2012, at 6:49 PM, Ben Clay wrote: > Suresh- > Thanks for the tips, I’ll check those functions out, and examine plugging in > a different NetworkTopology. > So to clarify, under the current scheme, if we have 1 block on two local rack > nodes A and B, it randomly chooses between those? IE, if DataNode A is > serving 20 clients and DataNode B is serving 1 client, they both have a 50% > chance of being selected for the 21st client? > -Ben > > From: Suresh Srinivas [mailto:sur...@hortonworks.com] > Sent: Thursday, January 05, 2012 5:33 PM > To: hdfs-user@hadoop.apache.org > Subject: Re: HDFS load balancing for non-local reads > > Currently it sorts the block locations as: > # local node > # local rack node > # random order of remote nodes > > See DatanodeManager#sortLocatedBlock(...) and > NetworkTopology#pseudoSortByDistance(...). > > You can play around with other policies by plugging in different > NetworkTopology. > > On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote: > Hi- > > How does the NameNode handle load balancing of non-local reads with multiple > block locations when locality is equal? > > IE, if the client is equidistant (same rack) from 2 DataNodes hosting the > same block, does the NameNode consider current client count or any other load > indicators when deciding which DataNode will satisfy the read request? Or, > is the client provided a list of all split locations and is allowed to make > this choice themselves? > > Thanks! > > -Ben >