[jira] Commented: (MAPREDUCE-733) When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.

Hemanth Yamijala (JIRA) Wed, 08 Jul 2009 20:50:43 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729031#action_12729031
 ]


Hemanth Yamijala commented on MAPREDUCE-733:
--------------------------------------------

bq. One more thing I've observed while going through this is that reservations 
are not removed on a TaskTracker that is globally blacklisted either via large 
task-failure count or via unhealthy status.

I had filed MAPREDUCE-682 for tracking this. Arun, if you remember, we had 
discussed this a couple of days back. We decided it was not a major problem. 
For a short while these reservations might remain for the blacklisted nodes and 
count against the job. But the nodes which are healthy can pick up and run the 
tasks of the job (in the next wave), which might have run on the blacklisted 
trackers.

> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker 
> heartbeat exception occurs. 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-733
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-733
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Iyappan Srinivasan
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-733_0_20090708.patch, 
> MAPREDUCE-733_0_20090708_yhadoop20.patch
>
>
> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker 
> heartbeat. 
> It seems when a  task tracker is killed , it throws exception. Instead it 
> should catch it and process it and allow the rest of the flow to go through.
> 2009-07-08 11:58:26,116 INFO  ipc.Server (Server.java:run(973)) - IPC Server 
> handler 7 on 40193, call 
> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@13ec758, false, false, 
> true, 6) from 127.0.0.1:40200: error: java.io.IOException: 
> java.lang.RuntimeException: tracker_host1.rack.com:localhost/127.0.0.1:40197 
> already has slots reserved for null; being asked to un-reserve for 
> job_200907081158_0001
> java.io.IOException: java.lang.RuntimeException: 
> tracker_host1.rack.com:localhost/127.0.0.1:40197 already has slots reserved 
> for null; being asked to un-reserve for job_200907081158_0001
>         at 
> org.apache.hadoop.mapreduce.server.jobtracker.TaskTracker.unreserveSlots(TaskTracker.java:162)
>         at 
> org.apache.hadoop.mapred.JobInProgress.addTrackerTaskFailure(JobInProgress.java:1580)
>         at 
> org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:2908)
>         at 
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1025)
>         at 
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3869)
>         at 
> org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3081)
>         at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2819)
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)
> 2009-07-08 11:58:26,162 INFO  mapred.TaskTracker 
> (TaskTracker.java:transmitHeartBeat(1196)) - Resending 'status' to 
> 'localhost' with reponseId '6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-733) When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.

Reply via email to