This was a normal Nutch parse. I'm still not sure what was causing
the bug, but it stopped last week.
On 10/7/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> This happens when two reduce tasks try to write to the same output
> folder, usually on the dfs. Was this a Nutch Parse job or a custom Map
> Reduce job?
>
> Dennis Kubes
>
> Ned Rockson wrote:
> > This is the second time I've run this large parse of ~64m documents.
> > In the reduce phase, both times through there has been this Exception
> > thrown. Has anyone seen this before, or could someone explain what is
> > going on here? (full stack trace is as follows):
> >
> >
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
> > file
> > /disks/d0/nutch/mapreduce/system/job_0001/tip_0001_r_000008/task_0001_r_000008_0/data
> > for DFSClient_task_0001_r_000008_0 on client 208.96.54.73 because
> > current leaseholder is trying to recreate file.
> > at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:669)
> > at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
> > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:585)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> >
> > at org.apache.hadoop.ipc.Client.call(Client.java:469)
> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> > at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> > at
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1119)
> > at
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1057)
> > at
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1283)
> > at
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1236)
> > at
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1218)
> > at
> > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
> > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> > at java.io.DataOutputStream.write(DataOutputStream.java:90)
> > at
> > org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:395)
> > at
> > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
> > at
> > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> > at java.io.DataOutputStream.write(DataOutputStream.java:90)
> > at
> > org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:884)
> > at org.apache.hadoop.io.MapFile$Writer.append(MapFile.java:162)
> > at
> > org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:208)
> > at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:311)
> > at org.apache.nutch.parse.ParseSegment.reduce(ParseSegment.java:117)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
> > at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>