[
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536133#comment-13536133
]
Alejandro Abdelnur commented on MAPREDUCE-3772:
-----------------------------------------------
MultipleOutputs was implemented to work properly when speculative execution
enable with FileOutputFormat implementations (Text, SequenceFile).
FileOutputFormats, to handle speculative execution, write the output to the
following path *$mapred.out.dir/\_temporary/\_$taskid* while execution. If
speculative execution is in progress for a given task, there will be 2 tasks
IDs for it, this means that while the 'competing' tasks are running their
outputs go to different directories. When the first speculative task completes,
its output will be committed (moved to the *$mapred.out.dir*) and the second
speculative task will be discarded, as well as its output. MultipleOutputs
creates the named outputs under the *_$taskid* directory, thus leveraging all
the speculative execution functionality and behavior implemented by
FileOutputFormat. If the named output file is not within the *_$taskid*
directory, then all the logic just described does not work as the task commit
procedure is done only from the *_$taskid* directory to the *$mapred.out.dir*.
Because of this I think that Priyo suggestion of logging a warning makes sense.
There is caveat to this, using MO.write(K,V,NamedOutputPath) method, the
warning would be logged in the task log that is creating the named output with
an absolute path.
> MultipleOutputs output lost if baseOutputPath starts with ../
> -------------------------------------------------------------
>
> Key: MAPREDUCE-3772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client
> Affects Versions: 0.20.2
> Reporter: Radim Kolar
> Assignee: Harsh J
> Attachments: MAPREDUCE-3772.patch
>
>
> Lets say you have output directory set:
> FileOutputFormat.setOutputPath(job, "/tmp/multi1/out");
> and want to place output from MultipleOutputs into /tmp/multi1/extra
> I expect following code to work:
> mos = new MultipleOutputs<Text, IntWritable>(context);
> mos.write(new Text("zrr"), value, "../extra/");
> but no Exception is throw and expected output directory /tmp/multi1/extra
> does not even exists. All data written to this output vanish without trace.
> To make it work fullpath must be used
> mos.write(new Text("zrr"), value, "/tmp/multi1/extra/");
> Output is listed in statistics from MultipleOutputs correctly:
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
> ../gaja1/=13333 (* everything is lost *)
> /tmp/multi1/out/../ksd34/=13333 (* this using full path works
> *)
> list1=6667
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira