Hello, I should also point out that I'm using a SequenceFileInputFormat. Regards Saptarshi Guha
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha <saptarshi.g...@gmail.com>wrote: > Hello, > I'm running a 90 node c1.xlarge cluster. No reducers, > mapred.max.map.tasks=6 per machine. > The AMI is own and uses Hadoop 0.19.1 > The dataset has 145K keys, and the processing time is huge. > > Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map > tasks, across the machines. > No machine is running more than 3 tasks most are running 1, some are > running 0. > Looking at the map records read, it appears these 49 tasks correspond to > the 145k records. > Q) Why? Why isn't the running tasks a much higher number? If each machine > can run 6, then why not make this a higher number and run across the > machines? > This is under utilization > > So I set the mapred.map.tasks=90. > At the hadoop machine list, all 90 machines are at least 1 task , mostly 1, > some 2 and a small few 3+(max 4) > At the job tracker page, only 23 are running, 48 pending (when i sent this > email). > With 90 machines(and Map Task Capacity of 540), why aren't 90 running at > one go? > > What should be set? What isn't set? > > Regards > Saptarshi Guha >