Saurabh,

>  let me re frame my question I wanted to knowhow job tracker decides the
> assignment of input splits to task tracker based on task tracker's data
> locality. Where is this policy defined? Is it pluggable?

Sorry, I misunderstood your question then. This code is in
o.a.h.mapred.JobInProgress. It is likely spread across many methods in
the class. But a good starting point could be from methods like
obtainNewMapTask or obtainNewReduceTask.

At the moment, this policy is not pluggable. But I know there have
been discussions (possibly even a JIRA, though I can't locate any now)
asking for this capability.

Thanks
Hemanth

>
> On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yhema...@gmail.com>wrote:
>
>> Saurabh,
>>
>> > i am experimenting with hadoop. wanted to ask that is the Task
>> distribution
>> > policy by job tracker pluggable if yes where in the code tree is it
>> defined.
>> >
>>
>> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
>> that needs to be extended to define a new scheduling policy. Also,
>> please do take a look at the existing schedulers that extend this
>> class. There are 3-4 implementations including the default scheduler,
>> capacity scheduler, fairshare scheduler and dynamic priority
>> scheduler. It may be worthwhile to see if your ideas match any of the
>> existing implementations to some degree and then consider enhancing
>> those as a first option.
>>
>> Thanks
>> Hemanth
>>
>

Reply via email to