Hi Lance, It's possible this is related to the other JIRA (HADOOP-5761). If it's not too much trouble to try out the 19.2 branch from SVN, it would be helpful in determining whether this is a problem that's already fixed or if you've discovered something new.
Thanks -Todd On Fri, May 22, 2009 at 2:01 PM, Lance Riedel <la...@dotspots.com> wrote: > Hi Todd, > We had looked at that before.. here is the location of the tmp directory: > > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh > /dist/app/hadoop-0.19.1/tmp > 248G /dist/app/hadoop-0.19.1/tmp > > There are no cron jobs that would have anything to do with that directory. > > Here is the /tmp > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp > 204K /tmp > > Does this look like a disk error? I had seen that the > "org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus. > > Thanks! > Lance > > > > > > On Fri, May 22, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com> wrote: > > > Version 19.1 with patches: > > 4780-2v19.patch (Jira 4780) > > closeAll3.patch (Jira 3998) > > I have confirmed that > https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that is > not the fix. > > > > > > We are having task trackers die every night with a null pointer > exception. > > Usually 2 or so out of 8 (25% each night). > > > > > > Here are the logs: > > > > Version 19.1 with > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: > > Received 'KillJobAction' for job: job_200905211749_0451 > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner: > > attempt_200905211749_0451_m_000000_0 done; removing files. > > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > > in any of the configured local directories > > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: > Received > > 'KillJobAction' for job: job_200905211749_0444 > > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner: > > attempt_200905211749_0444_m_000000_0 done; removing files. > > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > > in any of the configured local directories > > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > > in any of the configured local directories > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: > > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_000006_0 > task's > > state:UNASSIGNED > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying > > to launch : attempt_200905211749_0452_m_000006_0 > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In > > TaskLauncher, current free slots : 4 and trying to launch > > attempt_200905211749_0452_m_000006_0 > > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM > > Runner jvm_200905211749_0452_m_1998728288 spawned. > > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM > with > > ID: jvm_200905211749_0452_m_1998728288 given task: > > attempt_200905211749_0452_m_000006_0 > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > > in any of the configured local directories > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_000006_0/output/file.out > > in any of the configured local directories > > 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_200905211749_0452_m_000006_0 1.0% hdfs:// > > > ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259 > < > http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259 > > > > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task > > attempt_200905211749_0452_m_000006_0 is done. > > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: > reported > > output size for attempt_200905211749_0452_m_000006_0 was 0 > > 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker: > > addFreeSlot : current free slots : 4 > > 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM : > > jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1 > > 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: > Recieved > > RenitTrackerAction from JobTracker > > 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can > not > > start task tracker because java.lang.NullPointerException > > at > > > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300) > > at > > > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273) > > at > org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840) > > at > org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728) > > at > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785) > > > > 2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker: > > SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down TaskTracker at domU-12-31-38-01-AD-91/ > > 10.253.178.95 > > ************************************************************/ > > > > >