[jira] Commented: (MAPREDUCE-964) Inaccurate values in jobSummary logs

Sreekanth Ramakrishnan (JIRA) Thu, 17 Sep 2009 23:55:22 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757008#action_12757008
 ]


Sreekanth Ramakrishnan commented on MAPREDUCE-964:
--------------------------------------------------

The reason for the negative values being present in the JobSummary results from 
the taskStatus not having the finishTime field not being set in the taskStatus. 
This happens especially when the tasks which are being killed when they just 
have finished, resulting in a race condition. If look at the attempt which is 
mentioned has actually finished and same time a kill task action has been recv 
for same attempt at the same time.


Had modified the code to put in the debug statement following is example of 
task which upset the metering:

{noformat}
2009-09-18 06:18:32,001 INFO org.apache.hadoop.mapred.JobInProgress: TaskDebug 
attemptId : attempt_200909180346_0004_m_000575_0 slots : SLOTS_MILLIS_MAPS 
tip.numslots is: 1 difference to add : -1253254705577 status start : 
1253254705577 status end time : 0 
{noformat}


TT logs :
{noformat}
2009-09-18 06:18:01,200 INFO org.apache.hadoop.mapred.TaskTracker: 
LaunchTaskAction (registerTask): attempt_200909180346_0004_m_000575_0 task's 
state:UNASSIGNED
2009-09-18 06:18:01,200 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
launch : attempt_200909180346_0004_m_000575_0 which needs 1 slots
2009-09-18 06:18:01,200 INFO org.apache.hadoop.mapred.TaskTracker: In 
TaskLauncher, current free slots : 1 and trying to launch 
attempt_200909180346_0004_m_000575_0 which needs 1 slots
2009-09-18 06:18:02,750 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: 
jvm_200909180346_0004_m_1883468461 given task: 
attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:09,034 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_200909180346_0004_m_000575_0 0.18553433% 
xxxx/my_reliability_test_input/part-00032:335544320+67108864
2009-09-18 06:18:12,040 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_200909180346_0004_m_000575_0 0.25399202% 
xxx/my_reliability_test_input/part-00032:335544320+67108864
2009-09-18 06:18:15,317 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_200909180346_0004_m_000575_0 0.3378629% 
xxx/my_reliability_test_input/part-00032:335544320+67108864
2009-09-18 06:18:15,319 INFO org.apache.hadoop.mapred.TaskTracker: Received 
KillTaskAction for task: attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:15,319 INFO org.apache.hadoop.mapred.TaskTracker: About to 
purge task: attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:20,413 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_200909180346_0004_m_000575_0 done; removing files.
2009-09-18 06:18:20,415 INFO org.apache.hadoop.mapred.IndexCache: Map ID 
attempt_200909180346_0004_m_000575_0 not found in cache
2009-09-18 06:18:25,511 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_200909180346_0004_m_000575_0 done; removing files.
2009-09-18 06:18:25,515 INFO org.apache.hadoop.mapred.TaskTracker: 
LaunchTaskAction (registerTask): attempt_200909180346_0004_m_000575_0 task's 
state:FAILED_UNCLEAN
2009-09-18 06:18:25,516 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
launch : attempt_200909180346_0004_m_000575_0 which needs 1 slots
2009-09-18 06:18:25,516 INFO org.apache.hadoop.mapred.TaskTracker: In 
TaskLauncher, current free slots : 2 and trying to launch 
attempt_200909180346_0004_m_000575_0 which needs 1 slots
2009-09-18 06:18:26,651 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: 
jvm_200909180346_0004_m_-1834354161 given task: 
attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:26,888 INFO org.apache.hadoop.mapred.TaskTracker: Received 
KillTaskAction for task: attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:26,888 INFO org.apache.hadoop.mapred.TaskTracker: About to 
purge task: attempt_200909180346_0004_m_000575_0
2009-09-18 06:18:31,986 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_200909180346_0004_m_000575_0 0.0% 
2009-09-18 06:18:31,986 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_200909180346_0004_m_000575_0 Ignoring status-update since runState: 
FAILED
2009-09-18 06:18:31,989 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_200909180346_0004_m_000575_0 done; removing files.
2009-09-18 06:18:31,990 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call statusUpdate(attempt_200909180346_0004_m_000575_0, 
org.apache.hadoop.mapred.maptasksta...@4e69b84c) from 127.0.0.1:51146: output 
error
2009-09-18 06:18:31,995 INFO org.apache.hadoop.mapred.IndexCache: Map ID 
attempt_200909180346_0004_m_000575_0 not found in cache
2009-09-18 06:18:34,992 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_200909180346_0004_m_000575_0 done; removing files.
{noformat}

> Inaccurate values in jobSummary logs
> ------------------------------------
>
>                 Key: MAPREDUCE-964
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-964
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajiv Chittajallu
>            Priority: Critical
>
> For some jobs the mapSlotSeconds is incorrect.
> negative value
> 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565,
>             
> numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED,
>          
> mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754
> or too high
> 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949,
>             
> numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED,
>          
> mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-964) Inaccurate values in jobSummary logs

Reply via email to