Hello again, The scheduler does take care of locality with a good algorithm. However, the String[] of locations is for one given logical split. It does not matter to the scheduler what your split really contains (single/multiple block or offsets), it only looks at 'where' to run the particular split favorably according to the location hostnames supplied. For every host supplied, it adds an entry to a cache set of <Node, Set<Local_TIPs>>.
For a split that's got 2 file offsets of 100% + 60%; giving the 100% local place alone, would be the best idea. Have I understood your question right here? On Thu, Apr 21, 2011 at 6:54 PM, Johannes Zillmann <[email protected]> wrote: > So how does inputSplit#getLocations() influence the task distribution ? I > assume a split is assigned in favor to a tasktracker which matches one of its > location !? > Assuming that we have a combined split and there is one location with 100% > data-locality and one location with 60% data-locality. > If the task-sceheduler doesn't care about the order of specified locations > wouldn't it be better to specify as split-location only the 100% location so > that its more likely that the split is assigned to that host ? > > > Johannes > > On Apr 21, 2011, at 9:37 AM, Harsh J wrote: > >> Hey Johannes, >> >> On Wed, Apr 20, 2011 at 3:37 PM, Johannes Zillmann >> <[email protected]> wrote: >>> Should it contain all hosts which contains a replica of any of the blocks, >>> sorted in a way the the hosts which contributes the most data come first ? >>> Or should it contains only those host which were determined as most optimal >>> regarding the data-locality during the splitting-process. >>> >>> F.e. in case (a). Should the location array only contain this one host, or >>> should it contain all hosts but the one host with all the blocks should >>> simply be on the first position ? >> >> Its better to send all locations for maximal locality, but the order >> is not considered AFAIK. Its the order of TT heartbeats at the JT that >> matters, instead. >> >> -- >> Harsh J > > -- Harsh J
