So how does inputSplit#getLocations() influence the task distribution ? I assume a split is assigned in favor to a tasktracker which matches one of its location !? Assuming that we have a combined split and there is one location with 100% data-locality and one location with 60% data-locality. If the task-sceheduler doesn't care about the order of specified locations wouldn't it be better to specify as split-location only the 100% location so that its more likely that the split is assigned to that host ?
Johannes On Apr 21, 2011, at 9:37 AM, Harsh J wrote: > Hey Johannes, > > On Wed, Apr 20, 2011 at 3:37 PM, Johannes Zillmann > <[email protected]> wrote: >> Should it contain all hosts which contains a replica of any of the blocks, >> sorted in a way the the hosts which contributes the most data come first ? >> Or should it contains only those host which were determined as most optimal >> regarding the data-locality during the splitting-process. >> >> F.e. in case (a). Should the location array only contain this one host, or >> should it contain all hosts but the one host with all the blocks should >> simply be on the first position ? > > Its better to send all locations for maximal locality, but the order > is not considered AFAIK. Its the order of TT heartbeats at the JT that > matters, instead. > > -- > Harsh J
