Re: Strange RemoteException thrown while doing a parse of ~64m documents

Dennis Kubes Sun, 07 Oct 2007 22:31:51 -0700

This happens when two reduce tasks try to write to the same outputfolder, usually on the dfs. Was this a Nutch Parse job or a custom MapReduce job?


Dennis Kubes


Ned Rockson wrote:

This is the second time I've run this large parse of ~64m documents.
In the reduce phase, both times through there has been this Exception
thrown.  Has anyone seen this before, or could someone explain what is
going on here? (full stack trace is as follows):
        

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file 
/disks/d0/nutch/mapreduce/system/job_0001/tip_0001_r_000008/task_0001_r_000008_0/data
for DFSClient_task_0001_r_000008_0 on client 208.96.54.73 because
current leaseholder is trying to recreate file.
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:669)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)

        at org.apache.hadoop.ipc.Client.call(Client.java:469)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1119)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1057)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1283)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1236)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1218)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:395)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:884)
        at org.apache.hadoop.io.MapFile$Writer.append(MapFile.java:162)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:208)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:311)
        at org.apache.nutch.parse.ParseSegment.reduce(ParseSegment.java:117)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

Re: Strange RemoteException thrown while doing a parse of ~64m documents

Reply via email to