Hi David,
I think Hadoop is looking at the data size, not the no. of input files. If I 
pass in .gz files, then yes hadoop is choosing 1 map task per file but if I 
pass in HUGE text file or same file split into 10 files, its choosing same no. 
of maps tasks (191 in my case).

Thanks
Praveen

-----Original Message-----
From: ext David Rosenstrauch [mailto:dar...@darose.net] 
Sent: Monday, June 20, 2011 3:39 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: controlling no. of mapper tasks

On 06/20/2011 03:24 PM, praveen.pe...@nokia.com wrote:
> Hi there, I know client can send "mapred.reduce.tasks" to specify no.
> of reduce tasks and hadoop honours it but "mapred.map.tasks" is not 
> honoured by Hadoop. Is there any way to control number of map tasks?
> What I noticed is that Hadoop is choosing too many mappers and there 
> is an extra overhead being added due to this. For example, when I have 
> only 10 map tasks, my job finishes faster than when Hadoop chooses 191 
> map tasks. I have 5 slave cluster and 10 tasks can run in parallel. I 
> want to set both map and reduce tasks to be 10 for max efficiency.
>
> Thanks Praveen

The number of map tasks is determined dynamically based on the number of input 
chunks you have.  If you want fewer map tasks either pass fewer input files to 
your job, or store the files using larger chunk sizes (which will result in 
fewer chunks per file, and thus fewer chunks total).

HTH,

DR

Reply via email to