[ 
https://issues.apache.org/jira/browse/HADOOP-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566518#action_12566518
 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2391:
--------------------------------------------------

As far as I understood we are trying to solve the problem:  "If JT is alive 
tasks will be cleanedup eventually by TaskCommitThread by discarding their 
outputs. We have garbage left in dfs if the JT dies before cleaning up."

With option 1, we delete ${mapred.output.dir}/_tmp before we declare the job as 
successful which is going to delete all the garbage. Once we fix HADOOP-2759, 
we will be safe because when the task tries to create files ( output files or 
side-files), it will fail. I think option 1 would alone solve problem once 
HADOOP-2759 is done,  as tasks failing after job completion will be ignored.
Do we still need to wait till all the speculative tasks die, as this is going 
to take a lot of time to declare job as Succeeded? 

> Speculative Execution race condition with output paths
> ------------------------------------------------------
>
>                 Key: HADOOP-2391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2391
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>         Environment: all
>            Reporter: Dennis Kubes
>            Assignee: Devaraj Das
>             Fix For: 0.16.1
>
>         Attachments: HADOOP-2391-1-20071211.patch
>
>
> I am tracking a problem where when speculative execution is enabled, there is 
> a race condition when trying to read output paths from a previously completed 
> job.  More specifically when reduce tasks run their output is put into a 
> working directory under the task name until the task in completed.  The 
> directory name is something like workdir/_taskid.  Upon completion the output 
> get moved into workdir.  Regular tasks are checked for this move and not 
> considered completed until this move is made.  I have not verified it but all 
> indications point to speculative tasks NOT having this same check for 
> completion and more importantly removal when killed.  So what we end up with 
> when trying to read the output of previous tasks with speculative execution 
> enabled is the possibility that previous workdir/_taskid will be present when 
> the output directory is read by a chained job.  Here is an error when 
> supports my theory:
> Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot 
> open filename 
> /u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)
>         at org.apache.hadoop.ipc.Client.call(Client.java:507)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831)
>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
>         at 
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:563)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:526)
> I will continue to research this and post as I make progress on tracking down 
> this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to