All splits sent for processing in a Job carry a list of locations where their blocks reside -- this plus the network hierarchy details held by the JT is used to determine the locality level.
Have a look at JobInProgress.getLocalityLevel(), which takes a given TaskInProgress object, and a TaskTrackerStatus (got via Heartbeats from TT) and determines the level of locality that can be obtained if the task were scheduled to that particular TT. You can dig down or up from here :) On Fri, Jan 14, 2011 at 4:18 AM, Pedro Costa <psdc1...@gmail.com> wrote: > Hi, > > I've hadoop installed in a cluster and I would like that JT could > guess in the network topology what are the input files in HDFS that > are closer to him, and further. > So, how can a JT know if an input file is located on local-level, on > rack-level, or on the other level? > > Thanks, > -- > Pedro > -- Harsh J www.harshj.com