Re: Task allocation to TaskTrackers

Vasiliy Baranov Wed, 14 Feb 2007 13:48:46 -0800

Hi Nigel,

It is so nice to hear from you again!

Thank you for clarifying these. I have a follow up question. How areracks configured, that is, how does the system know which rack a machineis in. I went through the proposal(https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf)and patch source code(https://issues.apache.org/jira/secure/attachment/12350262/rack.patch),and it looks like the DataNode is implemented so that it receives itsrack's "network location" via the new -r/--rack option or, if thelatter is not specified, by Runtime.execing the "dfs.network.script"script. If both are not specified, the DataNode belongs to the defaultrack. Correct?


Thank you,
Vasiliy

Nigel Daley wrote:

Hi Vasiliy :)
I have a question regarding task allocation to TaskTrackers (couldnot find an answer in the docs). When a MapReduce job is run, doesthe system attempt to schedule a Map task on a machine that containsa replica of the task's input data, or not?
Yes, the JobTracker attempts to schedule the map on a node containingthat map's input split.
If yes, how does the system know which TaskTracker corresponds towhich DataNode (by IP address, by host name, or by something else)?
See InputSlit.getLocations()(http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/InputSplit.java?view=markup).Currently, host names are used, but I believe it's moving to IPaddress (see https://issues.apache.org/jira/browse/HADOOP-985).
Also, what happens if that fails?
The task is schedule elsewhere. However, now that DataNodes are awareof the rack they are on (as of 0.11.0), the JobTracker needs to bemodified so that its fallback is to attempt to locate the map on anode "close" (same rack) as its data.
Cheers,
Nige

Re: Task allocation to TaskTrackers

Reply via email to