I am interested in a few things, all pertaining to hdfs block
locations for running map tasks. I have spent several days looking
through the hadoop source code and have arrived at a couple of
questions that are still plaguing me.

1) When the jobtracker assigns a task to a tasktracker, it determines
if the task is data-local or rack-local from the splits (which were
generated during the job init process). Where in the code could I
"refresh" the split locations in case they have changed or blocks have
been replicated to additional new datanodes?

2) When a tasktracker is assigned a map task, is it informed if it's a
data-local or rack-local map task? If so, where in the code does this
take place, and is it possible to patch the code to have it check to
see if it has a data-local copy of the block first before going to the
network to download the block from another datanode?

Thanks for your time in advance.
Regards,
Mike Cardosa

Reply via email to