Hi,
I have a hadoop job running on over 50k files, each of which is about 500M. I need to extract some tiny information from each file and no reducer is needed. However, the output from the mappers result in many small files (size is ~50k, the block size is however 64M, so it wastes a lot of space). How can I set the number of mappers (say 100)? If there is no way to set the number of mappers, the only way to solve it is "cat" some files together? Many Thanks, Wei
