Hi Matei, Using the fair scheduler of the cloudera distribution seems to have (mostly) solved the problem. Thanks a lot for the suggestion.
-Virajith On Tue, Jul 12, 2011 at 7:23 PM, Matei Zaharia <ma...@eecs.berkeley.edu>wrote: > Hi Virajith, > > The default FIFO scheduler just isn't optimized for locality for small > jobs. You should be able to get substantially more locality even with 1 > replica if you use the fair scheduler, although the version of the scheduler > in 0.20 doesn't contain the locality optimization. Try the Cloudera > distribution to get a 0.20-compatible Hadoop that does contain it. > > I also think your value of 10% inferred on completion time might be a > little off, because you have quite a few more data blocks than nodes so it > should be easy to make the first few waves of tasks data-local. Try a > version of Hadoop that correctly measures this counter. > > Matei > > On Jul 12, 2011, at 1:27 PM, Virajith Jalaparti wrote: > > I agree that the scheduler has lesser leeway when the replication factor is > 1. However, I would still expect the number of data-local tasks to be more > than 10% even when the replication factor is 1. Presumably, the scheduler > would have greater number of opportunities to schedule data-local tasks as > compared to just 10%. (Please note that I am inferring that a map was > non-local based on the observed completion time. I don't know why but the > logs of my jobs don't show the DATA_LOCAL_MAPS counter information.) > > I will try using higher replication factors and see how much improvement I > can get. > > Thanks, > Virajith > > On Tue, Jul 12, 2011 at 6:15 PM, Arun C Murthy <a...@hortonworks.com>wrote: > >> As Aaron mentioned the scheduler has very little leeway when you have a >> single replica. >> >> OTOH, schedulers equate rack-locality to node-locality - this makes sense >> sense for a large-scale system since intra-rack b/w is good enough for most >> installs of Hadoop. >> >> Arun >> >> On Jul 12, 2011, at 7:36 AM, Virajith Jalaparti wrote: >> >> I am using a replication factor of 1 since I dont to incur the overhead of >> replication and I am not much worried about reliability. >> >> I am just using the default Hadoop scheduler (FIFO, I think!). In case of >> a single rack, rack-locality doesn't really have any meaning. Obviously >> everything will run in the same rack. I am concerned about data-local maps. >> I assumed that Hadoop would do a much better job at ensuring data-local maps >> but it doesnt seem to be the case here. >> >> -Virajith >> >> On Tue, Jul 12, 2011 at 3:30 PM, Arun C Murthy <a...@hortonworks.com>wrote: >> >>> Why are you running with replication factor of 1? >>> >>> Also, it depends on the scheduler you are using. The CapacityScheduler in >>> 0.20.203 (not 0.20.2) has much better locality for jobs, similarly with >>> FairScheduler. >>> >>> IAC, running on a single rack with replication of 1 implies rack-locality >>> for all tasks which, in most cases, is good enough. >>> >>> Arun >>> >>> On Jul 12, 2011, at 5:45 AM, Virajith Jalaparti wrote: >>> >>> > Hi, >>> > >>> > I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of >>> input data using a 20 node cluster of nodes. HDFS is configured to use 128MB >>> block size (so 1600maps are created) and a replication factor of 1 is being >>> used. All the 20 nodes are also hdfs datanodes. I was using a bandwidth >>> value of 50Mbps between each of the nodes (this was configured using linux >>> "tc"). I see that around 90% of the map tasks are reading data over the >>> network i.e. most of the map tasks are not being scheduled at the nodes >>> where the data to be processed by them is located. >>> > My understanding was that Hadoop tries to schedule as many data-local >>> maps as possible. But in this situation, this does not seem to happen. Any >>> reason why this is happening? and is there a way to actually configure >>> hadoop to ensure the maximum possible node locality? >>> > Any help regarding this is very much appreciated. >>> > >>> > Thanks, >>> > Virajith >>> >>> >> >> > >