> > At the default log level, Hadoop job logs (the ones you also get in the > job's output directory under _logs/history)
Thanks Simone, that's exactly what I was looking for. Look at the job history logs. They break down the times for each task I understand you guys are talking about the same thing? I'm using the file in /outputDir/__logs/history . Interestingly, before you told me, I was convinced that was actually a .jar archive so it took me a little while to figure out where these history logs where :) Thanks again folks! Antonio On Wed, Mar 17, 2010 at 4:45 PM, Owen O'Malley <[email protected]> wrote: > > On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote: > > Hi everybody, >> as part of my project work at school I'm running some Hadoop jobs on a >> cluster. I'd like to measure exactly how long each phase of the process >> takes: mapping, shuffling (ideally divided in copying and sorting) and >> reducing. >> > > Look at the job history logs. They break down the times for each task. You > need to run a script to aggregate them. You can see an example of the > aggregation on my petabyte sort description: > > > http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html > > -- Owen >
