I am trying to run 8 map tasks with 2 reduce on 3 machines. Each task runs on a 6 MB text file and 500 such files. The monitoring page shows very few number of Map tasks running than intended. Sometimes some nodes doesn't even get any tasks assigned though there are large number of files remaining needs to be scheduled for map operation. Is it due to distributing the files across nodes? In fact, my file system is set to local.
Some important parameters are listed below Io.sort.factor=100 Io.sort.mb = 1000 Io.file.buffer.size = 4096000 Io.bytes.checksum=128 Mapred.map.tasks=16 Mapred.reduce.tasks=2 Mapred.tasktracker.tasks.maximum=4 Mapred.combine.buffer.size=100000 Is there any parameter I am missing to maximize the use of all CPUS? Thanks, VJ
