[
https://issues.apache.org/jira/browse/MAPREDUCE-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728841#action_12728841
]
Devaraj Das commented on MAPREDUCE-733:
---------------------------------------
bq. tracker being globally blacklisted (and declared 'unhealthy' ?) isn't
propagated to the job (via JobInProgress.addTrackerTaskFailure).
Do you mean to say that for any job that is currently running *and* for future
jobs, this should be done? Is there a good use case for this? The reason i am
asking this is because globally blacklisted trackers are not considered for
assigning new tasks at all. We may run into race conditions (especially for
future jobs), where a globally blacklisted tracker may not be considered by
jobs for assigning new tasks to, even when it is marked healthy (globally). For
example, a job starts an hour before the tracker is supposed to be marked
healthy, but since the job blacklists it prematurely, even after it is marked
healthy, the job cannot make use of this tracker. This can probably be handled
but it might complicate the logic of global blacklisting & per-job blacklisting.
> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker
> heartbeat exception occurs.
> ------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-733
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Reporter: Iyappan Srinivasan
> Attachments: MAPREDUCE-733_0_20090708.patch
>
>
> When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker
> heartbeat.
> It seems when a task tracker is killed , it throws exception. Instead it
> should catch it and process it and allow the rest of the flow to go through.
> 2009-07-08 11:58:26,116 INFO ipc.Server (Server.java:run(973)) - IPC Server
> handler 7 on 40193, call
> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@13ec758, false, false,
> true, 6) from 127.0.0.1:40200: error: java.io.IOException:
> java.lang.RuntimeException: tracker_host1.rack.com:localhost/127.0.0.1:40197
> already has slots reserved for null; being asked to un-reserve for
> job_200907081158_0001
> java.io.IOException: java.lang.RuntimeException:
> tracker_host1.rack.com:localhost/127.0.0.1:40197 already has slots reserved
> for null; being asked to un-reserve for job_200907081158_0001
> at
> org.apache.hadoop.mapreduce.server.jobtracker.TaskTracker.unreserveSlots(TaskTracker.java:162)
> at
> org.apache.hadoop.mapred.JobInProgress.addTrackerTaskFailure(JobInProgress.java:1580)
> at
> org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:2908)
> at
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1025)
> at
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3869)
> at
> org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3081)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2819)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)
> 2009-07-08 11:58:26,162 INFO mapred.TaskTracker
> (TaskTracker.java:transmitHeartBeat(1196)) - Resending 'status' to
> 'localhost' with reponseId '6
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.