[jira] Commented: (MAPREDUCE-733) When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.

Devaraj Das (JIRA) Wed, 08 Jul 2009 11:31:43 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728841#action_12728841
 ]


Devaraj Das commented on MAPREDUCE-733:
---------------------------------------

bq. tracker being globally blacklisted (and declared 'unhealthy' ?) isn't 
propagated to the job (via JobInProgress.addTrackerTaskFailure).
Do you mean to say that for any job that is currently running *and* for future 
jobs, this should be done? Is there a good use case for this? The reason i am 
asking this is because globally blacklisted trackers are not considered for 
assigning new tasks at all. We may run into race conditions (especially for 
future jobs), where a globally blacklisted tracker may not be considered by 
jobs for assigning new tasks to, even when it is marked healthy (globally). For 
example, a job starts an hour before the tracker is supposed to be marked 
healthy, but since the job blacklists it prematurely, even after it is marked 
healthy, the job cannot make use of this tracker. This can probably be handled 
but it might complicate the logic of global blacklisting & per-job blacklisting.

> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker 
> heartbeat exception occurs. 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-733
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-733
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Iyappan Srinivasan
>         Attachments: MAPREDUCE-733_0_20090708.patch
>
>
> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker 
> heartbeat. 
> It seems when a  task tracker is killed , it throws exception. Instead it 
> should catch it and process it and allow the rest of the flow to go through.
> 2009-07-08 11:58:26,116 INFO  ipc.Server (Server.java:run(973)) - IPC Server 
> handler 7 on 40193, call 
> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@13ec758, false, false, 
> true, 6) from 127.0.0.1:40200: error: java.io.IOException: 
> java.lang.RuntimeException: tracker_host1.rack.com:localhost/127.0.0.1:40197 
> already has slots reserved for null; being asked to un-reserve for 
> job_200907081158_0001
> java.io.IOException: java.lang.RuntimeException: 
> tracker_host1.rack.com:localhost/127.0.0.1:40197 already has slots reserved 
> for null; being asked to un-reserve for job_200907081158_0001
>         at 
> org.apache.hadoop.mapreduce.server.jobtracker.TaskTracker.unreserveSlots(TaskTracker.java:162)
>         at 
> org.apache.hadoop.mapred.JobInProgress.addTrackerTaskFailure(JobInProgress.java:1580)
>         at 
> org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:2908)
>         at 
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1025)
>         at 
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3869)
>         at 
> org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3081)
>         at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2819)
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)
> 2009-07-08 11:58:26,162 INFO  mapred.TaskTracker 
> (TaskTracker.java:transmitHeartBeat(1196)) - Resending 'status' to 
> 'localhost' with reponseId '6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-733) When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.

Reply via email to