At the default log level, Hadoop job logs (the ones you also get in the job's output directory under _logs/history) contain entries like the following:
ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002" TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0" START_TIME="1220331166789" HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755" ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002" TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0" TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1220332036001" SORT_FINISHED="1220332036014" FINISH_TIME="1220332063254" HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755" You get start time, shuffle finish time, sort finish time and overall finish time. Similarly, you get start and finish time for MapAttempt entries. Hope this helps, Simone On 03/17/10 12:47, Antonio D'Ettole wrote: > Hi everybody, > as part of my project work at school I'm running some Hadoop jobs on a > cluster. I'd like to measure exactly how long each phase of the process > takes: mapping, shuffling (ideally divided in copying and sorting) and > reducing. The tasktracker logs do not seem to supply the start/end times for > each phase, at least not all of them, even when the log level is set to > DEBUG. > Do you have any ideas on how I could work this out? > Thanks > Antonio > -- Simone Leo Distributed Computing group Advanced Computing and Communications program CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: [email protected] http://www.crs4.it
