RE: Scaling hadoop up

Dhruba Borthakur Thu, 29 Mar 2007 13:07:51 -0800

I agree that running the Namenode and the jobtracker on different machines
is the way to go.  How much physical memory do these nodes have? Do you see
any swapping activity on these nodes?


Also, please use java 1.6. If you are using java 1.5, then the Namenode will
consume plenty of CPU in socket polling.

Thanks,
dhruba

-----Original Message-----
From: Michael Bieniosek [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 29, 2007 2:02 PM
To: [email protected]; Doug Cutting
Subject: Re: Scaling hadoop up

I've seen this with 0.12.1.

Currently I'm just running the jobtracker and namenode on one machine, with
tasktrackers & datanodes on all the others (no secondarynamenode).  It seems
like it might help to put the jobtracker and namenode on different machines;
is there anything else I could try?

-Michael

On 3/29/07 1:37 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Michael Bieniosek wrote:
>> When I try to scale Hadoop up to about 100 nodes on EC2 (single-cpu Xen),
I
>> notice things start to fall apart.  For example, the jobtracker starts
>> dropping requests with the message "Call queue overflow discarding oldest
>> call".  I've also seen problems with the namenode where dfs requests fail
>> with EOFExceptions.
> 
> What version of Hadoop are you seeing this with?  Scalability has been
> improving.
> 
>> I've tried increasing the heartbeat value for the dfs (it's not
configurable
>> for the jobtracker though).  Is there some other trick to make hadoop
scale
>> a little further?  The website claims that Hadoop has scaled to 600
nodes,
>> but it seems like I would need a very powerful machine for the namenode
and
>> jobtracker to do this.  Am I missing something?
> 
> Yahoo! does use dual-processor nodes that are more powerful than EC2's
> virtual nodes, but probably not 6x more powerful.
> 
> Doug

RE: Scaling hadoop up

Reply via email to