I have 2 input seq files 32MB each. I want to run them on as many mappers as possible.

i appended -D mapred.max.split.size=1000000 as command line argument to job, but there is no difference. Job still runs on 2 mappers.

How split size works? Is max split size used for reading or writing files?

it works like this?: set maxsplitsize, write files and you will get bunch of seq files as output. then you will get same number of mappers as input files.

Reply via email to