thanks both for the comments, but even though finally, I managed to get the output file of the current mapper, I couldn't use it because apparently, mappers uses " _temporary" file while it's in process. So in Mapper.close , the file for eg. "part-00000" which it wrote to, does not exists yet.
There has to be another way to get the produced file. I need to sort it immediately within mappers. Again, your thoughts are really helpful ! Mark On Mon, May 23, 2011 at 5:51 AM, Luca Pireddu <[email protected]> wrote: > > > The path is defined by the FileOutputFormat in use. In particular, I think > this function is responsible: > > > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext > , > java.lang.String) > > It should give you the file path before all tasks have completed and the > output > is committed to the final output path. > > Luca > > On May 23, 2011 14:42:04 Joey Echeverria wrote: > > Hi Mark, > > > > FYI, I'm moving the discussion over to > > [email protected] since your question is specific to > > MapReduce. > > > > You can derive the output name from the TaskAttemptID which you can > > get by calling getTaskAttemptID() on the context passed to your > > cleanup() funciton. The task attempt id will look like this: > > > > attempt_200707121733_0003_m_000005_0 > > > > You're interested in the m_000005 part, This gets translated into the > > output file name part-m-00005. > > > > -Joey > > > > On Sat, May 21, 2011 at 8:03 PM, Mark question <[email protected]> > wrote: > > > Hi, > > > > > > I'm running a job with maps only and I want by end of each map > > > (ie.Close() function) to open the file that the current map has wrote > > > using its output.collector. > > > > > > I know "job.getWorkingDirectory()" would give me the parent path of > the > > > file written, but how to get the full path or the name (ie. part-00000 > or > > > part-00001). > > > > > > Thanks, > > > Mark > > -- > Luca Pireddu > CRS4 - Distributed Computing Group > Loc. Pixina Manna Edificio 1 > Pula 09010 (CA), Italy > Tel: +39 0709250452 >
