Suresh-

Thanks for the tips, I'll check those functions out, and examine plugging in
a different NetworkTopology.

So to clarify, under the current scheme, if we have 1 block on two local
rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
serving 20 clients and DataNode B is serving 1 client, they both have a 50%
chance of being selected for the 21st client?

-Ben

 

From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
Sent: Thursday, January 05, 2012 5:33 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS load balancing for non-local reads

 

Currently it sorts the block locations as:
# local node
# local rack node
# random order of remote nodes

See DatanodeManager#sortLocatedBlock(...) and
NetworkTopology#pseudoSortByDistance(...).

You can play around with other policies by plugging in different
NetworkTopology.

On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbc...@ncsu.edu> wrote:

Hi-

 

How does the NameNode handle load balancing of non-local reads with multiple
block locations when locality is equal?

 

IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
same block, does the NameNode consider current client count or any other
load indicators when deciding which DataNode will satisfy the read request?
Or, is the client provided a list of all split locations and is allowed to
make this choice themselves?

 

Thanks!

 

-Ben

 

 

Reply via email to