Hello again,

The scheduler does take care of locality with a good algorithm.
However, the String[] of locations is for one given logical split. It
does not matter to the scheduler what your split really contains
(single/multiple block or offsets), it only looks at 'where' to run
the particular split favorably according to the location hostnames
supplied. For every host supplied, it adds an entry to a cache set of
<Node, Set<Local_TIPs>>.

For a split that's got 2 file offsets of 100% + 60%; giving the 100%
local place alone, would be the best idea.

Have I understood your question right here?

On Thu, Apr 21, 2011 at 6:54 PM, Johannes Zillmann
<[email protected]> wrote:
> So how does inputSplit#getLocations() influence the task distribution ? I 
> assume a split is assigned in favor to a tasktracker which matches one of its 
> location !?
> Assuming that we have a combined split and there is one location with 100% 
> data-locality and one location with 60% data-locality.
> If the task-sceheduler doesn't care about the order of specified locations 
> wouldn't it be better to specify as split-location only the 100% location so 
> that its more likely that the split is assigned to that host ?
>
>
> Johannes
>
> On Apr 21, 2011, at 9:37 AM, Harsh J wrote:
>
>> Hey Johannes,
>>
>> On Wed, Apr 20, 2011 at 3:37 PM, Johannes Zillmann
>> <[email protected]> wrote:
>>> Should it contain all hosts which contains a replica of any of the blocks, 
>>> sorted in a way the the hosts which contributes the most data come first ?
>>> Or should it contains only those host which were determined as most optimal 
>>> regarding the data-locality during the splitting-process.
>>>
>>> F.e. in case (a). Should the location array only contain this one host, or 
>>> should it contain all hosts but the one host with all the blocks should 
>>> simply be on the first position ?
>>
>> Its better to send all locations for maximal locality, but the order
>> is not considered AFAIK. Its the order of TT heartbeats at the JT that
>> matters, instead.
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Reply via email to