very long cleanup after a job fails ----------------------------------- Key: HADOOP-244 URL: http://issues.apache.org/jira/browse/HADOOP-244 Project: Hadoop Type: Bug
Components: mapred Reporter: Yoram Arnon Assigned to: Sameer Paranjpye Eight hours after a job failed (it executed for about 14 hours prior to failing), many task trackers keep throwing the exceptions below: 060523 121732 Server handler 0 on 50040 caught: java.io.FileNotFoundException: LocalFS java.io.FileNotFoundException: LocalFS at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123) at org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157) at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116) at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151) at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230) 060523 121814 task_0006_r_000123_0 copy failed: task_0006_m_046105_0 from node5:50040 java.net.SocketTimeoutException: timed out waiting for rpc response at org.apache.hadoop.ipc.Client.call(Client.java:305) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:150) at org.apache.hadoop.mapred.$Proxy2.getFile(Unknown Source) at org.apache.hadoop.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:112) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:67) 060523 121814 task_0006_r_000123_0 0.13023989% reduce > copy > [EMAIL PROTECTED]:50040 060523 121814 task_0006_r_000123_0 Copying task_0006_m_048815_0 output from node6 060523 121817 SEVERE Can't open map output:/hadoop/mapred/local/task_0006_m_031921_0/part-152.out java.io.FileNotFoundException: LocalFS at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123) at org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157) at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116) at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151) at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230) 060523 121817 Unknown child with bad map output: task_0006_m_031921_0. Ignored. 060523 121817 Server handler 1 on 50040 caught: java.io.FileNotFoundException: LocalFS java.io.FileNotFoundException: LocalFS at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123) at org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157) at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116) at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151) at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230) 060523 121914 task_0006_r_000123_0 copy failed: task_0006_m_048815_0 from node6:50040 java.net.SocketTimeoutException: timed out waiting for rpc response at org.apache.hadoop.ipc.Client.call(Client.java:305) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:150) at org.apache.hadoop.mapred.$Proxy2.getFile(Unknown Source) at org.apache.hadoop.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:112) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:67) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira