[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506107#comment-13506107
 ] 

Priyo Mustafi commented on MAPREDUCE-3772:
------------------------------------------

MultipleOutputs exposes to methods.
  1) public <K,V> void write(String namedOutput,K key,V value)
  2) public <K,V> void write(String namedOutput,K key,V value,String 
baseOutputPath)
where
  namedOutput - the named output name
  baseOutputPath - base-output path to write the record to. Note: Framework 
will generate unique filename for the baseOutputPath 
  
We use the second one which allows you to provide a baseOutputPath where the 
data needs to be written.  I don't see anywhere in the javadoc which mentions 
that baseOutputPath shouldn't be a fully qualified path.  So the Jira is 
definitely valid.  Either the Javadoc needs to be fixed or the code needs to be 
fixed and I would prefer the latter as we have developed extensive 
data-pipelines based on this.  If it is not fixed, we have to change the 
absolute paths to sub-directory paths and then once the job is done, move all 
those directories out to the expected locations.

Aside that, if we provide baseOutputPath as "abc/def/xyz" then it puts the 
directory under the main output directory i.e. you get files like this  
<main-output-dir>/abc/def/xyz-r-00000.   Instead if you use baseOutputPath as 
"/abc/def/xyz" where the path isn't a subdirectory of the main output 
directory, then the problem is seen.  



                
> MultipleOutputs output lost if baseOutputPath starts with ../
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-3772
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.203.0, 0.22.0
>         Environment: FreeBSD
>            Reporter: Radim Kolar
>
> Lets say you have output directory set:
> FileOutputFormat.setOutputPath(job, "/tmp/multi1/out");
> and want to place output from MultipleOutputs into /tmp/multi1/extra
> I expect following code to work:
> mos = new MultipleOutputs<Text, IntWritable>(context);
> mos.write(new Text("zrr"), value, "../extra/");
> but no Exception is throw and expected output directory /tmp/multi1/extra 
> does not even exists. All data written to this output vanish without trace.
> To make it work fullpath must be used
> mos.write(new Text("zrr"), value, "/tmp/multi1/extra/");
> Output is listed in statistics from MultipleOutputs correctly:
>         org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
>                 ../gaja1/=13333 (* everything is lost *)
>                 /tmp/multi1/out/../ksd34/=13333 (* this using full path works 
> *)
>                 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to