Re: EC2, Max tasks, under utilized?

Hong Tang Tue, 23 Jun 2009 10:26:03 -0700

Do you use block compression in sequence file? How large is your totaldataset?


On Jun 23, 2009, at 7:50 AM, Saptarshi Guha wrote:

Hello,
I should also point out that I'm using a SequenceFileInputFormat.

Regards
Saptarshi Guha


On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
<saptarshi.g...@gmail.com>wrote:
Hello,
I'm running a 90 node c1.xlarge cluster. No reducers,
mapred.max.map.tasks=6 per machine.
The AMI is own and uses Hadoop 0.19.1
The dataset has 145K keys, and the processing time is huge.
Now, when set the mapred.map.tasks=14,000 what ends up running is49 map
tasks, across the machines.
No machine is running more than 3 tasks most are running 1, some are
running 0.
Looking at the map records read, it appears these 49 taskscorrespond to
the 145k records.
Q) Why? Why isn't the running tasks a much higher number? If eachmachine
can run 6, then why not make this a higher number and run across the
machines?
This is under utilization

So I set the mapred.map.tasks=90.
At the hadoop machine list, all 90 machines are at least 1 task ,mostly 1,
some 2 and a small few 3+(max 4)
At the job tracker page, only 23 are running, 48 pending (when isent this
email).
With 90 machines(and Map Task Capacity of 540), why aren't 90running at
one go?

What should be set? What isn't set?

Regards
Saptarshi Guha

Re: EC2, Max tasks, under utilized?

Reply via email to