Hi, If you want all your map outputs in a single file you can use a IdentityReducer and set the number of reducers to 1. This would ensure that all your mapper output goes into the reducer and it wites into a single file.
Soumya On Tue, Sep 20, 2011 at 2:04 PM, Harsh J <[email protected]> wrote: > Hello Wei! > > On Tue, Sep 20, 2011 at 1:25 PM, Peng, Wei <[email protected]> wrote: > (snip) > > However, the output from the mappers result in many small files (size is > > ~50k, the block size is however 64M, so it wastes a lot of space). > > > > How can I set the number of mappers (say 100)? > > What you're looking for is to 'pack' several files per mapper, if I > get it right. > > In that case, you need to check out the CombineFileInputFormat. It can > pack several files per mapper (with some degree of locality). > > Alternatively, pass a list of files (as a text file) as your input, > and have your Mapper logic read them one by one. This way, if you > divide 50k filenames over 100 files, you will get 100 mappers as you > want - but at the cost of losing almost all locality. > > > If there is no way to set the number of mappers, the only way to solve > > it is "cat" some files together? > > Concatenating is an alternative, if affordable - yes. You can lower > the file count (down from 50k) this way. > > -- > Harsh J >
