Hi, I have a simple map-reduce program [map only :) ]that reads the input and emits the same to n outputs on a single node cluster with max map tasks set to 10 on a 16 core processor machine.
After a while the tasks begin to fail with the following exception log. 2011-01-01 03:17:52,149 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=temp,temp ip=/x.x.x.x cmd=delete src=/TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_000000_2 dst=null perm=null 2011-01-01 03:17:52,156 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_7046642930904717718_23143 on x.x.x.x:<port> size 66148 But it does not belong to any file. 2011-01-01 03:17:52,156 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: failed to complete /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_000000_2/Output0-m-00000 because dir.getFileBlocks() is null and pendingFile is null 2011-01-01 03:17:52,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 12 on 9000, call complete(/TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_000000_2/Output0-m-00000, DFSClient_attempt_201101010256_0006_m_000000_2) from x.x.x.x:<port> error: java.io.IOException: Could not complete write to file /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_000000_2/Output0-m-00000 by DFSClient_attempt_201101010256_0006_m_000000_2 java.io.IOException: Could not complete write to file /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_000000_2/Output0-m-00000 by DFSClient_attempt_201101010256_0006_m_000000_2 at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) Looks like there's a delete command issued by FsNameSystem.audit before the it errors out stating it could not complete write to the file inside that.. Any clue on what could have gone wrong? Thanks Sudharsan S