[ 
https://issues.apache.org/jira/browse/HADOOP-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566183#action_12566183
 ] 

Arun C Murthy commented on HADOOP-2391:
---------------------------------------

We had a hallway discussion which discussed the following:

1. The ${mapred.output.dir}/_temp/_${taskid} as illustrated by Amareshwari's 
comment.

_Pros_ : 
a) Easy to implement
b) Keeps job's output in ${mapred.output.dir}, so it ensures there are no junk 
files on HDFS which aren't noticeable by anyone (we assume the user will notice 
this *smile*).

_Cons_: This still doesn't solve the problem... the issue is that there might 
be tasks which get launched as the job is completing and go ahead and create 
the _${taskid} directory (see HADOOP-2759 i.e. HDFS *create* automatically 
creates parent-directories). This problem is further aggravated by tasks 
creating side-files in the _${taskid} directory; another point to remember is 
that the OutputFormat is user-code...

2. Put task-outputs in a job-specific temporary system-directory outside the 
${mapred.output.dir} and then move them into ${mapred.output.dir}.

The problem with this approach is that although it is simple and solves the 
problems on-hand, we might be left with random files on HDFS which will never 
ever be noticed by anyone - leading to space-creep and at the very least 
requires a _tmpclean_ tool. We also need to study how this will work with 
permissions and quotas.

3. Do not declare a job as complete till all it's TIPs have succeeded and all 
speculative tasks are killed.

_Pros_:
a) It's probably the most _correct_ solution of the lot.
b) This will mostly work (see the _Cons_).

_Cons_:
a) Implementation is a little more involved... (we probably need to mark the 
Job as "done, but cleaning up")
b) There are corner cases: think of a job which is complete, and whose 
speculative tasks are running on a TaskTracker which is _lost_ before the task 
is killed... we need to wait atleast 10mins (current timeout) before declaring 
the TaskTracker as _lost_ and the job as SUCCESS. Even this doesn't guarantee 
that the _task_ is actually dead since it could still be running on the 
TaskTracker node... and creating side-files etc. (again HADOOP-2759).
c) The _lost tasktracker_ problem described above potentially adds a finite lag 
to jobs being declared a success. This doesn't play well with short-running 
jobs which need SLAs on completion/failure times (of course they can set the 
TaskTracker timeout to be less than 10mins on their clusters, just something to 
consider). 

----

Overall combination of 1 & 3 i.e. having a single ${mapred.output.dir}/_tmp as 
the parent of all temporary tasks' directories and also waiting for all tasks 
to be killed might work well in most cases. We still need to fix HADOOP-2759, 
or at least add a *create* api which doesn't automatically create parent 
directories for this to work. 

Thoughts?

Note: Adding a new *create* api which doesn't automatically create parent dirs 
is a part of the solution, the other part is to educate users to not use the 
_old_ create api in their own OutputFormats.
  

> Speculative Execution race condition with output paths
> ------------------------------------------------------
>
>                 Key: HADOOP-2391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2391
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>         Environment: all
>            Reporter: Dennis Kubes
>            Assignee: Devaraj Das
>             Fix For: 0.16.1
>
>         Attachments: HADOOP-2391-1-20071211.patch
>
>
> I am tracking a problem where when speculative execution is enabled, there is 
> a race condition when trying to read output paths from a previously completed 
> job.  More specifically when reduce tasks run their output is put into a 
> working directory under the task name until the task in completed.  The 
> directory name is something like workdir/_taskid.  Upon completion the output 
> get moved into workdir.  Regular tasks are checked for this move and not 
> considered completed until this move is made.  I have not verified it but all 
> indications point to speculative tasks NOT having this same check for 
> completion and more importantly removal when killed.  So what we end up with 
> when trying to read the output of previous tasks with speculative execution 
> enabled is the possibility that previous workdir/_taskid will be present when 
> the output directory is read by a chained job.  Here is an error when 
> supports my theory:
> Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot 
> open filename 
> /u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)
>         at org.apache.hadoop.ipc.Client.call(Client.java:507)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831)
>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
>         at 
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:563)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:526)
> I will continue to research this and post as I make progress on tracking down 
> this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to