[
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506107#comment-13506107
]
Priyo Mustafi commented on MAPREDUCE-3772:
------------------------------------------
MultipleOutputs exposes to methods.
1) public <K,V> void write(String namedOutput,K key,V value)
2) public <K,V> void write(String namedOutput,K key,V value,String
baseOutputPath)
where
namedOutput - the named output name
baseOutputPath - base-output path to write the record to. Note: Framework
will generate unique filename for the baseOutputPath
We use the second one which allows you to provide a baseOutputPath where the
data needs to be written. I don't see anywhere in the javadoc which mentions
that baseOutputPath shouldn't be a fully qualified path. So the Jira is
definitely valid. Either the Javadoc needs to be fixed or the code needs to be
fixed and I would prefer the latter as we have developed extensive
data-pipelines based on this. If it is not fixed, we have to change the
absolute paths to sub-directory paths and then once the job is done, move all
those directories out to the expected locations.
Aside that, if we provide baseOutputPath as "abc/def/xyz" then it puts the
directory under the main output directory i.e. you get files like this
<main-output-dir>/abc/def/xyz-r-00000. Instead if you use baseOutputPath as
"/abc/def/xyz" where the path isn't a subdirectory of the main output
directory, then the problem is seen.
> MultipleOutputs output lost if baseOutputPath starts with ../
> -------------------------------------------------------------
>
> Key: MAPREDUCE-3772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv1
> Affects Versions: 0.20.203.0, 0.22.0
> Environment: FreeBSD
> Reporter: Radim Kolar
>
> Lets say you have output directory set:
> FileOutputFormat.setOutputPath(job, "/tmp/multi1/out");
> and want to place output from MultipleOutputs into /tmp/multi1/extra
> I expect following code to work:
> mos = new MultipleOutputs<Text, IntWritable>(context);
> mos.write(new Text("zrr"), value, "../extra/");
> but no Exception is throw and expected output directory /tmp/multi1/extra
> does not even exists. All data written to this output vanish without trace.
> To make it work fullpath must be used
> mos.write(new Text("zrr"), value, "/tmp/multi1/extra/");
> Output is listed in statistics from MultipleOutputs correctly:
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
> ../gaja1/=13333 (* everything is lost *)
> /tmp/multi1/out/../ksd34/=13333 (* this using full path works
> *)
> list1=6667
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira