This happens when two reduce tasks try to write to the same output folder, usually on the dfs. Was this a Nutch Parse job or a custom Map Reduce job?

Dennis Kubes

Ned Rockson wrote:
This is the second time I've run this large parse of ~64m documents.
In the reduce phase, both times through there has been this Exception
thrown.  Has anyone seen this before, or could someone explain what is
going on here? (full stack trace is as follows):
        

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file 
/disks/d0/nutch/mapreduce/system/job_0001/tip_0001_r_000008/task_0001_r_000008_0/data
for DFSClient_task_0001_r_000008_0 on client 208.96.54.73 because
current leaseholder is trying to recreate file.
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:669)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)

        at org.apache.hadoop.ipc.Client.call(Client.java:469)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1119)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1057)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1283)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1236)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1218)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:395)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:884)
        at org.apache.hadoop.io.MapFile$Writer.append(MapFile.java:162)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:208)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:311)
        at org.apache.nutch.parse.ParseSegment.reduce(ParseSegment.java:117)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

Reply via email to