The MapReduce program would create an output file for each reducer, named "part-xxxxxx" by default
-----邮件原件----- 发件人: Pavan Kulkarni [mailto:[email protected]] 发送时间: 2012年8月19日 23:58 收件人: [email protected] 主题: Re: Significance of file.out.index during Shuffle Phase ? Ohh ,Thanks a lot Harsh. Exactly what I was looking for. I wanted to create different file.out's for different reducers. Something like file.out.1 for reducer 1, file.out.2 for reducer etc. Is it possible to do this in the MapReduce program or I need to tweak some Hadoop source files for that? Thanks. On Sun, Aug 19, 2012 at 7:02 AM, Harsh J <[email protected]> wrote: > Hey Pavan, > > Yes you've got it almost right on how file.out is served to each > reducer. See the code at > > http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-proj > ect/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/j > ava/org/apache/hadoop/mapred/ShuffleHandler.java?view=markup > (Method under L502:L565 that sends data for a specific > reduce/partition ID (integer)). > > On Sun, Aug 19, 2012 at 9:05 AM, Pavan Kulkarni > <[email protected]> > wrote: > > Hi, > > > > I was trying to understand how exactly the reducers find out how > > to > fetch > > the data of its own partition from Map nodes. > > During the executions of MapReduce, I see that *file.out* is created > > on > Map > > nodes, so my question is how does a reducer know what part of > > file.out to fetch? Is the *file.out.index* play any > role? > > Any help is appreciated .Thanks > > > > > > > > --With Regards > > Pavan Kulkarni > > > > -- > Harsh J > -- --With Regards Pavan Kulkarni
