[ https://issues.apache.org/jira/browse/HADOOP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley resolved HADOOP-1098. ----------------------------------- Resolution: Duplicate This was fixed by HADOOP-1127. > output blocks lost when speculative execution is used > ----------------------------------------------------- > > Key: HADOOP-1098 > URL: https://issues.apache.org/jira/browse/HADOOP-1098 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.0 > Reporter: Nigel Daley > Priority: Critical > Fix For: 0.13.0 > > > The Sort benchmark completes successfully for me on the latest trunk (0.12.1 > candidate) with speculation turned on. Validation of the Sort benchmark > output, however, is failing. I see one sort output file (part-00375) that is > way smaller than all the rest. In fact, it is exactly 1 block long. > dfs ls output: > ... > /user/hadoopqa/sortBenchmark100/output/part-00373 <r 3> 2971688212 > /user/hadoopqa/sortBenchmark100/output/part-00374 <r 3> 2973451660 > /user/hadoopqa/sortBenchmark100/output/part-00375 <r 3> 134217728 > /user/hadoopqa/sortBenchmark100/output/part-00376 <r 3> 2972933208 > /user/hadoopqa/sortBenchmark100/output/part-00377 <r 3> 2972309956 > ... > During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the > NameNode log for this file (and more of these exceptions for other files > too). I also include here the 1 PendingReplicationMonitor WARN message from > the NameNode log in case it's relevant: > ... > 2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: > PendingReplicationMonitor timed out block blk_-849195508701590166 > ... > 2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* > NameSystem.startFile: failed to create file > /user/hadoopqa/sortBenchmark100/output/part-00375 for > DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates > is non-null. > 2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: > failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for > DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates > is non-null. > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /user/hadoopqa/sortBenchmark100/output/part-00375 for > DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates > is non-null. > at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701) > at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559) > ... > During sort validation, I get this exception in the JobTracker log: > 2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error > from task_0003_m_001379_0: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72) > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445) > I also saw this in the DataNode log during sort validation, but it could be > unrelated: > 2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: > java.io.IOException: Unexpected error trying to delete block > blk_-5047673597270588432. Block not found in blockMap. > at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596) > at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429) > at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053) > at java.lang.Thread.run(Thread.java:619) > Since the default for 0.12.1 will be for speculative execution to be turned > off, I am assigning this to 0.13.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.