Qinghe Jin created MESOS-290:
--------------------------------

             Summary: Jobtracker can't get TaskTrackerInfo when the JobTracker 
log file is deleted
                 Key: MESOS-290
                 URL: https://issues.apache.org/jira/browse/MESOS-290
             Project: Mesos
          Issue Type: Bug
          Components: framework, java-api
    Affects Versions: 0.9.0
         Environment: SUSE Linux Enterprise Server 11
            Reporter: Qinghe Jin
            Priority: Blocker


For some reason, the JobTracker log file is expanding over 20G and running out 
of my disk partion. I delete the jobtracker log file in logs/ and restart the 
hadoop system, then can't get my mapreduce work. The JobTracker is suffering 
from IOExceptions, the stack looks like:

2012-10-10 09:19:31,838 INFO org.apache.hadoop.mapred.JobTracker: Adding 
tracker tracker_blade17:localhost.localdomain/127.0.0.1:44216 to host blade17 
2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 
'tracker_blade19:localhost.localdomain/127.0.0.1:40465'
2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@7be536d6, 
true, true, true, -1) from 10.10.129.17:57073: error: java.io.IOException: 
java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade17
java.io.IOException: java.lang.RuntimeException: Expecting TaskTrackerInfo for 
host blade17   at 
org.apache.hadoop.mapred.FrameworkScheduler.assignTasks(FrameworkScheduler.java:518)
  at org.apache.hadoop.mapred.MesosScheduler.assignTasks(MesosScheduler.java:76)
  at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)  at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Adding 
tracker tracker_blade19:localhost.localdomain/127.0.0.1:40465 to host blade19
2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@58651e95, 
true, true, true, -1) from 10.10.129.19:46705: error: java.io.IOException: 
java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade19


On the tasktracker side, it sends status to the jobtracker, but with responseid 
-1,just like below


2012-10-10 09:31:24,463 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,466 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,468 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,471 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,473 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,476 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to 'blade20' with reponseId '-1

Is there any quick answer for this situation?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to