[ https://issues.apache.org/jira/browse/HADOOP-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550297 ]
Devaraj Das commented on HADOOP-2391: ------------------------------------- bq. Another, perhaps better solutions would simply be to store the temporary task output somewhere else NOT beneath the output directory +1 > Speculative Execution race condition with output paths > ------------------------------------------------------ > > Key: HADOOP-2391 > URL: https://issues.apache.org/jira/browse/HADOOP-2391 > Project: Hadoop > Issue Type: Bug > Environment: all > Reporter: Dennis Kubes > Assignee: Dennis Kubes > > I am tracking a problem where when speculative execution is enabled, there is > a race condition when trying to read output paths from a previously completed > job. More specifically when reduce tasks run their output is put into a > working directory under the task name until the task in completed. The > directory name is something like workdir/_taskid. Upon completion the output > get moved into workdir. Regular tasks are checked for this move and not > considered completed until this move is made. I have not verified it but all > indications point to speculative tasks NOT having this same check for > completion and more importantly removal when killed. So what we end up with > when trying to read the output of previous tasks with speculative execution > enabled is the possibility that previous workdir/_taskid will be present when > the output directory is read by a chained job. Here is an error when > supports my theory: > Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot > open filename > /u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1 > at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234) > at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644) > at org.apache.hadoop.ipc.Client.call(Client.java:507) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831) > at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263) > at > org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344) > at > org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87) > at org.apache.nutch.crawl.Generator.generate(Generator.java:429) > at org.apache.nutch.crawl.Generator.run(Generator.java:563) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54) > at org.apache.nutch.crawl.Generator.main(Generator.java:526) > I will continue to research this and post as I make progress on tracking down > this bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.