thanks both for the comments, but even though finally, I managed to get the
output file of the current mapper, I couldn't use it because apparently,
mappers uses " _temporary" file while it's in process. So in Mapper.close ,
the file for eg. "part-00000" which it wrote to, does not exists yet.

There has to be another way to get the produced file. I need to sort it
immediately within mappers.

Again, your thoughts are really helpful !

Mark

On Mon, May 23, 2011 at 5:51 AM, Luca Pireddu <[email protected]> wrote:

>
>
> The path is defined by the FileOutputFormat in use.  In particular, I think
> this function is responsible:
>
>
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext
> ,
> java.lang.String)
>
> It should give you the file path before all tasks have completed and the
> output
> is committed to the final output path.
>
> Luca
>
> On May 23, 2011 14:42:04 Joey Echeverria wrote:
> > Hi Mark,
> >
> > FYI, I'm moving the discussion over to
> > [email protected] since your question is specific to
> > MapReduce.
> >
> > You can derive the output name from the TaskAttemptID which you can
> > get by calling getTaskAttemptID() on the context passed to your
> > cleanup() funciton. The task attempt id will look like this:
> >
> > attempt_200707121733_0003_m_000005_0
> >
> > You're interested in the m_000005 part, This gets translated into the
> > output file name part-m-00005.
> >
> > -Joey
> >
> > On Sat, May 21, 2011 at 8:03 PM, Mark question <[email protected]>
> wrote:
> > > Hi,
> > >
> > >  I'm running a job with maps only  and I want by end of each map
> > > (ie.Close() function) to open the file that the current map has wrote
> > > using its output.collector.
> > >
> > >  I know "job.getWorkingDirectory()"  would give me the parent path of
> the
> > > file written, but how to get the full path or the name (ie. part-00000
> or
> > > part-00001).
> > >
> > > Thanks,
> > > Mark
>
> --
> Luca Pireddu
> CRS4 - Distributed Computing Group
> Loc. Pixina Manna Edificio 1
> Pula 09010 (CA), Italy
> Tel:  +39 0709250452
>

Reply via email to