[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536133#comment-13536133
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3772:
-----------------------------------------------

MultipleOutputs was implemented to work properly when speculative execution 
enable with FileOutputFormat implementations (Text, SequenceFile). 
FileOutputFormats, to handle speculative execution, write the output to the 
following path *$mapred.out.dir/\_temporary/\_$taskid* while execution. If 
speculative execution is in progress for a given task, there will be 2 tasks 
IDs for it, this means that while the 'competing' tasks are running their 
outputs go to different directories. When the first speculative task completes, 
its output will be committed (moved to the *$mapred.out.dir*) and the second 
speculative task will be discarded, as well as its output. MultipleOutputs 
creates the named outputs under the *_$taskid* directory, thus leveraging all 
the speculative execution functionality and behavior implemented by 
FileOutputFormat. If the named output file is not within the *_$taskid* 
directory, then all the logic just described does not work as the task commit 
procedure is done only from the *_$taskid* directory to the *$mapred.out.dir*.

Because of this I think that Priyo suggestion of logging a warning makes sense. 
There is caveat to this, using MO.write(K,V,NamedOutputPath) method, the 
warning would be logged in the task log that is creating the named output with 
an absolute path. 

 
                
> MultipleOutputs output lost if baseOutputPath starts with ../
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-3772
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.2
>            Reporter: Radim Kolar
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-3772.patch
>
>
> Lets say you have output directory set:
> FileOutputFormat.setOutputPath(job, "/tmp/multi1/out");
> and want to place output from MultipleOutputs into /tmp/multi1/extra
> I expect following code to work:
> mos = new MultipleOutputs<Text, IntWritable>(context);
> mos.write(new Text("zrr"), value, "../extra/");
> but no Exception is throw and expected output directory /tmp/multi1/extra 
> does not even exists. All data written to this output vanish without trace.
> To make it work fullpath must be used
> mos.write(new Text("zrr"), value, "/tmp/multi1/extra/");
> Output is listed in statistics from MultipleOutputs correctly:
>         org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
>                 ../gaja1/=13333 (* everything is lost *)
>                 /tmp/multi1/out/../ksd34/=13333 (* this using full path works 
> *)
>                 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to