Re: Task allocation to TaskTrackers

Nigel Daley Wed, 14 Feb 2007 10:06:27 -0800

Hi Vasiliy :)

I have a question regarding task allocation to TaskTrackers (couldnot find an answer in the docs). When a MapReduce job is run, doesthe system attempt to schedule a Map task on a machine thatcontains a replica of the task's input data, or not?

Yes, the JobTracker attempts to schedule the map on a node containingthat map's input split.

If yes, how does the system know which TaskTracker corresponds towhich DataNode (by IP address, by host name, or by something else)?

See InputSlit.getLocations() (http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/InputSplit.java?view=markup). Currently, host names are used, but I believe it'smoving to IP address (see https://issues.apache.org/jira/browse/HADOOP-985).

Also, what happens if that fails?

The task is schedule elsewhere. However, now that DataNodes areaware of the rack they are on (as of 0.11.0), the JobTracker needs tobe modified so that its fallback is to attempt to locate the map on anode "close" (same rack) as its data.


Cheers,
Nige

Re: Task allocation to TaskTrackers

Reply via email to