The copy phase fetches the map outputs. It may hang for a while if there are no newly completed map outputs to fetch yet.
You can raise your reducers' slowstart value to have it not spend so many cycles waiting but rather start at 80-90% of map completions, instead of default 5%. This helps your MR performance overall, if you run multiple jobs at a time, as the reduce slots aren't wasted. On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar <[email protected]> wrote: > Hey, > > Can anyone explain me what is reduce > copy phase in the reducer section ? > The (K,List(V)), is passed to the reducer. Is reduce > copy representing > copying of K,List(V) on the reducer from all mappers ? > > I am monitoring my jobs on the cluster, using Jobtracker url. > I am seeing for most of my reducing jobs, something like this : > > task_201201250352_0001_r_000000 31.05% reduce > copy (395 of 424 at 0.00 > MB/s) > 25-Jan-2012 03:54:06 > > task_201201250352_0001_r_000001 30.73% reduce > copy (391 of 424 at 0.00 > MB/s) > 25-Jan-2012 03:54:06 > > task_201201250352_0001_r_000002 30.89% reduce > copy (393 of 424 at 0.00 > MB/s) > 25-Jan-2012 03:54:06 > > > ............................ > > Can anyone explain me why the speed is 0.00 MB/s. Job is running fine. > Is it because this reduce > copy is happening on the same machine. > > Thanks, > Praveenesh -- Harsh J Customer Ops. Engineer, Cloudera
