----- Original Message -----
From: Bharath Ravi <bharathra...@gmail.com>
Date: Wednesday, October 19, 2011 8:16 am
Subject: Re: Load balancing requests in HDFS
To: common-dev@hadoop.apache.org

> Thanks a lot Steve!
> 
> ReplicationTargetChooser seems to address load balancing for initially
> placing/laying out data,
> but it doesn't seem to do active load balancing for incoming 
> requests to a
> datanode: or does it?

For every request, ReplicationTargetChooser will check the good targets to 
write.
( space, traffic, threadcount on DN..etc). DNs will update their statistics by 
heartbeats. So, NN can check this before actually choosing the taget to write 
the Data.
Hope, this clarifies your doubt.

> 
> Also, would you know if there are statistics on how effective
> over-replication is for throughput gain?
> Basically, although one might add more replicas, are they actually 
> usedeffectively to serve incoming requests?
here over-replication means uping the replication factor, is it?

> 
> On 18 October 2011 12:37, Steve Loughran <ste...@apache.org> wrote:
> 
> > On 16/10/11 02:53, Bharath Ravi wrote:
> >
> >> Hi all,
> >>
> >> I have a question about how HDFS load balances requests for 
> files/blocks:>>
> >> HDFS currently distributes data blocks randomly, for balance.
> >> However, if certain files/blocks are more popular than others, 
> some nodes
> >> might get an "unfair" number of requests.
> >> Adding more replicas for these popular files might not help, 
> unless HDFS
> >> explicitly distributes requests fairly among the replicas.
> >>
> >
> > Have a look at the ReplicationTargetChooser class; it does take 
> datanode> load into account, though it's concern is distribution 
> for data
> > availability, not performance.
> >
> > The standard technique for popular files -including MR job JAR 
> files- is to
> > over-replicate. One problem: how to determine what is popular 
> without adding
> > more load on the namenode
> >
> 
> 
> 
> -- 
> Bharath Ravi
> 

Regards,
Uma

Reply via email to