Hmmmm.... but How can the scheduler effect the performance of a Mapper if there are no competing jobs?
I thought the scheduler only impacted the way separate jobs got resources for different jobs. In my example, there are 2 mappers, 2+n files, and 1 job. Jay Vyas http://jayunit100.blogspot.com On Dec 6, 2012, at 4:39 AM, Bertrand Dechoux <decho...@gmail.com> wrote: > The short answer is yes it can be worth it because your job can finish > faster if you are not only allowing local mappers. But this is of course a > trade off. The best performance (but not latency) can be obtained when > using only local mappers. You should read about delay scheduling which > allows the user to pick what is the 'best'. Fair scheduler has it for > hadoop 1 and capacity scheduler has it but for hadoop 2. > > Regards > > Bertrand > > On Thu, Dec 6, 2012 at 6:14 AM, <jayunit...@gmail.com> wrote: > >> If there is a job with files f1 and f2, and a Mapper (m1) is running >> against a file (f2) which is far from the local machine(m1), will the >> overhead of copying f2 over to m1 be worth it?. >> >> That is .... - is the amount of resources required to read data off a >> remote machine (m2) worth it? Or would it be better if that remote (m2) >> now simply processed both files (f1, f2) in turn? >> >> Jay Vyas >> http://jayunit100.blogspot.com > > > > > -- > Bertrand Dechoux