Hi Virajith,

The default FIFO scheduler just isn't optimized for locality for small jobs. 
You should be able to get substantially more locality even with 1 replica if 
you use the fair scheduler, although the version of the scheduler in 0.20 
doesn't contain the locality optimization. Try the Cloudera distribution to get 
a 0.20-compatible Hadoop that does contain it.

I also think your value of 10% inferred on completion time might be a little 
off, because you have quite a few more data blocks than nodes so it should be 
easy to make the first few waves of tasks data-local. Try a version of Hadoop 
that correctly measures this counter.

Matei

On Jul 12, 2011, at 1:27 PM, Virajith Jalaparti wrote:

> I agree that the scheduler has lesser leeway when the replication factor is 
> 1. However, I would still expect the number of data-local tasks to be more 
> than 10% even when the replication factor is 1. Presumably, the scheduler 
> would have greater number of opportunities to schedule data-local tasks as 
> compared to just 10%. (Please note that I am inferring that a map was 
> non-local based on the observed completion time. I don't know why but the 
> logs of my jobs don't show the DATA_LOCAL_MAPS counter information.)
> 
> I will try using higher replication factors and see how much improvement I 
> can get.
> 
> Thanks,
> Virajith
> 
> On Tue, Jul 12, 2011 at 6:15 PM, Arun C Murthy <a...@hortonworks.com> wrote:
> As Aaron mentioned the scheduler has very little leeway when you have a 
> single replica.
> 
> OTOH, schedulers equate rack-locality to node-locality - this makes sense 
> sense for a large-scale system since intra-rack b/w is good enough for most 
> installs of Hadoop.
> 
> Arun
> 
> On Jul 12, 2011, at 7:36 AM, Virajith Jalaparti wrote:
> 
>> I am using a replication factor of 1 since I dont to incur the overhead of 
>> replication and I am not much worried about reliability. 
>> 
>> I am just using the default Hadoop scheduler (FIFO, I think!). In case of a 
>> single rack, rack-locality doesn't really have any meaning. Obviously 
>> everything will run in the same rack. I am concerned about data-local maps. 
>> I assumed that Hadoop would do a much better job at ensuring data-local maps 
>> but it doesnt seem to be the case here.
>> 
>> -Virajith
>> 
>> On Tue, Jul 12, 2011 at 3:30 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>> Why are you running with replication factor of 1?
>> 
>> Also, it depends on the scheduler you are using. The CapacityScheduler in 
>> 0.20.203 (not 0.20.2) has much better locality for jobs, similarly with 
>> FairScheduler.
>> 
>> IAC, running on a single rack with replication of 1 implies rack-locality 
>> for all tasks which, in most cases, is good enough.
>> 
>> Arun
>> 
>> On Jul 12, 2011, at 5:45 AM, Virajith Jalaparti wrote:
>> 
>> > Hi,
>> >
>> > I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input 
>> > data using a 20 node cluster of nodes. HDFS is configured to use 128MB 
>> > block size (so 1600maps are created) and a replication factor of 1 is 
>> > being used. All the 20 nodes are also hdfs datanodes. I was using a 
>> > bandwidth value of 50Mbps between each of the nodes (this was configured 
>> > using linux "tc"). I see that around 90% of the map tasks are reading data 
>> > over the network i.e. most of the map tasks are not being scheduled at the 
>> > nodes where the data to be processed by them is located.
>> > My understanding was that Hadoop tries to schedule as many data-local maps 
>> > as possible. But in this situation, this does not seem to happen. Any 
>> > reason why this is happening? and is there a way to actually configure 
>> > hadoop to ensure the maximum possible node locality?
>> > Any help regarding this is very much appreciated.
>> >
>> > Thanks,
>> > Virajith
>> 
>> 
> 
> 

Reply via email to