Thank you all for the quick reply!! I think I was wrong. It has nothing to do with the number of mappers because each input file has size 500M, which is not too small in terms of 64M per block.
The problem is that the output from each mapper is too small. Is there a way to combine some mappers output together? Setting the number of reducers to 1 might get a very huge file. Can I set the number of reducers to 100, but skip sorting, shuffling...etc.? Wei -----Original Message----- From: Soumya Banerjee [mailto:[email protected]] Sent: Tuesday, September 20, 2011 2:06 AM To: [email protected] Subject: Re: how to set the number of mappers with 0 reducers?. Hi, If you want all your map outputs in a single file you can use a IdentityReducer and set the number of reducers to 1. This would ensure that all your mapper output goes into the reducer and it wites into a single file. Soumya On Tue, Sep 20, 2011 at 2:04 PM, Harsh J <[email protected]> wrote: > Hello Wei! > > On Tue, Sep 20, 2011 at 1:25 PM, Peng, Wei <[email protected]> wrote: > (snip) > > However, the output from the mappers result in many small files (size is > > ~50k, the block size is however 64M, so it wastes a lot of space). > > > > How can I set the number of mappers (say 100)? > > What you're looking for is to 'pack' several files per mapper, if I > get it right. > > In that case, you need to check out the CombineFileInputFormat. It can > pack several files per mapper (with some degree of locality). > > Alternatively, pass a list of files (as a text file) as your input, > and have your Mapper logic read them one by one. This way, if you > divide 50k filenames over 100 files, you will get 100 mappers as you > want - but at the cost of losing almost all locality. > > > If there is no way to set the number of mappers, the only way to solve > > it is "cat" some files together? > > Concatenating is an alternative, if affordable - yes. You can lower > the file count (down from 50k) this way. > > -- > Harsh J >
