Qinghe Jin created MESOS-290:
--------------------------------
Summary: Jobtracker can't get TaskTrackerInfo when the JobTracker
log file is deleted
Key: MESOS-290
URL: https://issues.apache.org/jira/browse/MESOS-290
Project: Mesos
Issue Type: Bug
Components: framework, java-api
Affects Versions: 0.9.0
Environment: SUSE Linux Enterprise Server 11
Reporter: Qinghe Jin
Priority: Blocker
For some reason, the JobTracker log file is expanding over 20G and running out
of my disk partion. I delete the jobtracker log file in logs/ and restart the
hadoop system, then can't get my mapreduce work. The JobTracker is suffering
from IOExceptions, the stack looks like:
2012-10-10 09:19:31,838 INFO org.apache.hadoop.mapred.JobTracker: Adding
tracker tracker_blade17:localhost.localdomain/127.0.0.1:44216 to host blade17
2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker
'tracker_blade19:localhost.localdomain/127.0.0.1:40465'
2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6
on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@7be536d6,
true, true, true, -1) from 10.10.129.17:57073: error: java.io.IOException:
java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade17
java.io.IOException: java.lang.RuntimeException: Expecting TaskTrackerInfo for
host blade17 at
org.apache.hadoop.mapred.FrameworkScheduler.assignTasks(FrameworkScheduler.java:518)
at org.apache.hadoop.mapred.MesosScheduler.assignTasks(MesosScheduler.java:76)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Adding
tracker tracker_blade19:localhost.localdomain/127.0.0.1:40465 to host blade19
2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7
on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@58651e95,
true, true, true, -1) from 10.10.129.19:46705: error: java.io.IOException:
java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade19
On the tasktracker side, it sends status to the jobtracker, but with responseid
-1,just like below
2012-10-10 09:31:24,463 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,466 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,468 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,471 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,473 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
2012-10-10 09:31:24,476 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'blade20' with reponseId '-1
Is there any quick answer for this situation?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira