Hrmmm. I can tell init/execution at the job level, but I don't know how to figure that out at the individual map task level. What would be the best way for me to determine that?
-Sean On Wed, Mar 4, 2009 at 12:13 PM, Runping Qi <runping...@gmail.com> wrote: > Do you know the break down of times for a mapper task takes to initialize > and to execute the map function? > > > On Wed, Mar 4, 2009 at 8:44 AM, Sean Laurent <organicveg...@gmail.com > >wrote: > > > On Tue, Mar 3, 2009 at 10:14 PM, Amar Kamat <ama...@yahoo-inc.com> > wrote: > > > > > Yeah. May be its not the problem with the JobTracker. Can you check > (via > > > job history) what is the best and the worst task runtimes? You can > > analyze > > > the jobs after they complete. > > > > Okay, I ran the same job 35 times last night. Each job was exactly > > identical > > - it parsed 1000 identical files that were already stored in HDFS via a > map > > task (no reduce). Like all of my previous tests, each successive run took > > longer than the previous run. > > > > Looking at the job history, the first run was the fastest; it took a > total > > of 2mins 28sec (setup: 2 secs, map: 2min 22sec, cleanup: 0sec). The last > > run > > was the slowest; it took a total of 22mins 31sec (setup: 16sec, map: > 22mins > > 14sec, cleanup: 16sec). > > > > Memory usage on the JT/NN machine, as reported by sar, slowly increased > > over > > the 7 hour window. Memory usage on a randomly selected DN/TT also > steadily > > increased over the 7 hour window but far more rapidly. We also looked at > > I/O > > usage and CPU utilization on both the JT/NN machine and the same randomly > > selected DN/TT - nothing out of the ordinary. I/O waits (both from the > I/O > > subsystem level perspective and from the CPU's perspective) were > > consistently low over the 7 hour window and did not fluctuate > significantly > > on any of the machines. CPU utilization on the JT/NN was practically > > non-existent and hovered between 40%-60% on the DN/TT. >