I found the following works for me.
FileInputFormat.setMaxInputSplitSize(job, 10L * 1024L);
Kim
On 11/09/2011 04:11 AM, Radim Kolar wrote:
I have 2 input seq files 32MB each. I want to run them on as many
mappers as possible.
i appended -D mapred.max.split.size=1000000 as command line argument
to job, but there is no difference. Job still runs on 2 mappers.
How split size works? Is max split size used for reading or writing
files?
it works like this?: set maxsplitsize, write files and you will get
bunch of seq files as output. then you will get same number of mappers
as input files.