So how does inputSplit#getLocations() influence the task distribution ? I 
assume a split is assigned in favor to a tasktracker which matches one of its 
location !? 
Assuming that we have a combined split and there is one location with 100% 
data-locality and one location with 60% data-locality.
If the task-sceheduler doesn't care about the order of specified locations 
wouldn't it be better to specify as split-location only the 100% location so 
that its more likely that the split is assigned to that host ?


Johannes

On Apr 21, 2011, at 9:37 AM, Harsh J wrote:

> Hey Johannes,
> 
> On Wed, Apr 20, 2011 at 3:37 PM, Johannes Zillmann
> <[email protected]> wrote:
>> Should it contain all hosts which contains a replica of any of the blocks, 
>> sorted in a way the the hosts which contributes the most data come first ?
>> Or should it contains only those host which were determined as most optimal 
>> regarding the data-locality during the splitting-process.
>> 
>> F.e. in case (a). Should the location array only contain this one host, or 
>> should it contain all hosts but the one host with all the blocks should 
>> simply be on the first position ?
> 
> Its better to send all locations for maximal locality, but the order
> is not considered AFAIK. Its the order of TT heartbeats at the JT that
> matters, instead.
> 
> -- 
> Harsh J

Reply via email to