Moving to mapreduce-dev@ (bcc common-dev@).

Responses inline:

On Mar 29, 2010, at 7:02 PM, Mike Cardosa wrote:

1) When the jobtracker assigns a task to a tasktracker, it determines
if the task is data-local or rack-local from the splits (which were
generated during the job init process). Where in the code could I
"refresh" the split locations in case they have changed or blocks have
been replicated to additional new datanodes?


No easy way to do that. But in practice, I don't think it matters much.

2) When a tasktracker is assigned a map task, is it informed if it's a
data-local or rack-local map task? If so, where in the code does this
take place, and is it possible to patch the code to have it check to
see if it has a data-local copy of the block first before going to the
network to download the block from another datanode?


No, the TT doesn't know/care. The DFSClient in the Map has the smarts to do the i/o from the 'nearest' datanode.

Arun

Reply via email to