Amusingly this is almost the same question that was asked the other day :)

<quote from Owen O'Malley>
There isn't currently a way of getting a collated, but unsorted list of 
key/value pairs. For most applications, the in memory sort is fairly cheap 
relative to the shuffle and other parts of the processing.
</quote>

If you know that you will be filtering out a significant amount of information 
to the point where shuffle will be trivial then the impact of a reduce phase 
should be minimal using an identity reducer. It is either that aggregate as 
much data as you feel comfortable with into each split and have 1 file per map. 

How much data/percentage of input are you assuming will be output from each of 
these maps?

Matt

-----Original Message-----
From: Peng, Wei [mailto:[email protected]] 
Sent: Tuesday, September 20, 2011 10:22 AM
To: [email protected]
Subject: RE: how to set the number of mappers with 0 reducers?

Thank you all for the quick reply!!

I think I was wrong. It has nothing to do with the number of mappers
because each input file has size 500M, which is not too small in terms
of 64M per block.

The problem is that the output from each mapper is too small. Is there a
way to combine some mappers output together? Setting the number of
reducers to 1 might get a very huge file. Can I set the number of
reducers to 100, but skip sorting, shuffling...etc.?

Wei

-----Original Message-----
From: Soumya Banerjee [mailto:[email protected]] 
Sent: Tuesday, September 20, 2011 2:06 AM
To: [email protected]
Subject: Re: how to set the number of mappers with 0 reducers?.

Hi,

If you want all your map outputs in a single file you can use a
IdentityReducer and set the number of reducers to 1.
This would ensure that all your mapper output goes into the reducer and
it
wites into a single file.

Soumya

On Tue, Sep 20, 2011 at 2:04 PM, Harsh J <[email protected]> wrote:

> Hello Wei!
>
> On Tue, Sep 20, 2011 at 1:25 PM, Peng, Wei <[email protected]> wrote:
> (snip)
> > However, the output from the mappers result in many small files
(size is
> > ~50k, the block size is however 64M, so it wastes a lot of space).
> >
> > How can I set the number of mappers (say 100)?
>
> What you're looking for is to 'pack' several files per mapper, if I
> get it right.
>
> In that case, you need to check out the CombineFileInputFormat. It can
> pack several files per mapper (with some degree of locality).
>
> Alternatively, pass a list of files (as a text file) as your input,
> and have your Mapper logic read them one by one. This way, if you
> divide 50k filenames over 100 files, you will get 100 mappers as you
> want - but at the cost of losing almost all locality.
>
> > If there is no way to set the number of mappers, the only way to
solve
> > it is "cat" some files together?
>
> Concatenating is an alternative, if affordable - yes. You can lower
> the file count (down from 50k) this way.
>
> --
> Harsh J
>
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

Reply via email to