答复: Significance of file.out.index during Shuffle Phase ?

俞盛朋 Sun, 19 Aug 2012 17:45:07 -0700

The MapReduce program would create an output file for each reducer, named
"part-xxxxxx" by default


-----邮件原件-----
发件人: Pavan Kulkarni [mailto:[email protected]] 
发送时间: 2012年8月19日 23:58
收件人: [email protected]
主题: Re: Significance of file.out.index during Shuffle Phase ?

Ohh ,Thanks a lot Harsh. Exactly what I was looking for.
I wanted to create different file.out's for different reducers. Something
like
file.out.1 for reducer 1, file.out.2 for reducer etc. Is it possible to do
this in the MapReduce program or I need to tweak some Hadoop source files
for that? Thanks.

On Sun, Aug 19, 2012 at 7:02 AM, Harsh J <[email protected]> wrote:

> Hey Pavan,
>
> Yes you've got it almost right on how file.out is served to each 
> reducer. See the code at
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-proj
> ect/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/j
> ava/org/apache/hadoop/mapred/ShuffleHandler.java?view=markup
> (Method under L502:L565 that sends data for a specific 
> reduce/partition ID (integer)).
>
> On Sun, Aug 19, 2012 at 9:05 AM, Pavan Kulkarni 
> <[email protected]>
> wrote:
> > Hi,
> >
> >   I was trying to understand how exactly the reducers find out how 
> > to
> fetch
> > the data of its own partition from Map nodes.
> > During the executions of MapReduce, I see that *file.out* is created 
> > on
> Map
> > nodes, so my question is how does a reducer know what part of 
> > file.out to fetch? Is the *file.out.index* play any
> role?
> > Any help is appreciated .Thanks
> >
> >
> >
> > --With Regards
> > Pavan Kulkarni
>
>
>
> --
> Harsh J
>



-- 

--With Regards
Pavan Kulkarni

答复: Significance of file.out.index during Shuffle Phase ?

Reply via email to