[ 
https://issues.apache.org/jira/browse/NUTCH-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232914#comment-13232914
 ] 

Markus Jelsma commented on NUTCH-1315:
--------------------------------------

Speculative task execution is enabled by default but the fetch and index jobs 
disable them. We have disabled speculative execution altogether at some point 
only because we need those slots to be free for other jobs.

Should extended OutputFormat's take care of this? It isn't clear in MapRed's 
API docs whether this is a problem. The name parameter is to be unique for the 
task's part of the output for the entire job, which it is.

Wouldn't including a task ID in the output name cause a mess in the final 
output?

In the mean time i would indeed disable speculative execution. In my opinion 
and experience with Nutch and other jobs it's not really worth it. It takes 
empty slots that you can use for other jobs and if there are no other jobs it 
still takes additional CPU cycles and RAM and disk I/O for a few seconds. I 
must add that our network is homogenous (fallacy) and all nodes have almost 
equal load.
                
> reduce speculation on but ParseOutputFormat doesn't name output files 
> correctly?
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-1315
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1315
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>         Environment: ubuntu 64bit, hadoop 1.0.1, 3 Node Cluster, segment size 
> 1.5M urls
>            Reporter: Rafael
>              Labels: hadoop, hdfs
>
> From time to time the Reducer log contains the following and one tasktracker 
> gets blacklisted.
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file 
> /user/test/crawl/segments/20120316065507/parse_text/part-00001/data for 
> DFSClient_attempt_201203151054_0028_r_000001_1 on client xx.x.xx.xx.10, 
> because this file is already being created by 
> DFSClient_attempt_201203151054_0028_r_000001_0 on xx.xx.xx.9
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1404)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1244)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1186)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
>       at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1066)
>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>       at $Proxy2.create(Unknown Source)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>       at $Proxy2.create(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3245)
>       at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:182)
>       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
>       at 
> org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1132)
>       at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
>       at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
>       at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:476)
>       at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:157)
>       at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:134)
>       at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:92)
>       at 
> org.apache.nutch.parse.ParseOutputFormat.getRecordWriter(ParseOutputFormat.java:110)
>       at 
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:448)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> I asked the hdfs-user mailing list and i got the following answer:
> "Looks like you have reduce speculation turned on, but the
> ParseOutputFormat you're using doesn't properly name its output files
> distinctly based on the task attempt ID. As a workaround you can
> probably turn off speculative execution for reduces, but you should
> also probably file a Nutch bug."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to