RE: Hadoop Cluster Size Scalability Numbers?

Dmitry Pushkarev Sun, 21 Sep 2008 14:52:54 -0700

Speaking about NFS-backup idea:
If I have secure nfs storage which is much slower than network (3MB/d vs
100MB/s network we use between nodes) will it adversely affect performance,
or I can rely on NFS caching to do the job?


And if nfs share dies, will it shutdown the namenode as well?

-----Original Message-----
From: Allen Wittenauer [mailto:[EMAIL PROTECTED] 
Sent: Sunday, September 21, 2008 1:38 PM
To: [email protected]
Subject: Re: Hadoop Cluster Size Scalability Numbers?




On 9/21/08 9:40 AM, "Guilherme Menezes" <[EMAIL PROTECTED]>
wrote:
> We currently have 4 nodes (16GB of
> ram, 6 * 750 GB disks, Quad-Core AMD Opteron processor). Our initial plans
> are to perform a Web crawl for academic purposes (something between 500
> million and 1 billion pages), and we need to expand the number of nodes
for
> that. Is it better to have a larger number of nodes simpler than the ones
we
> currently have (less memory, less processing?) in terms of Hadoop
> performance?

    Your current boxes seem overpowered for crawling. If it were me, I'd
probably:

        a) turn the current four machines into dedicated namenode, job
tracker, secondary name node, oh-no-a-machine-just-died! backup node (setup
an nfs server on it and run it as your secondary direct copy of the fsimage
and edits file if you don't have one).   With 16gb name nodes, you should be
able to store a lot of data.

        b) when you buy new nodes, I'd cut down on memory and cpu and just
turn them into your work horses

    That said, I know little-to-nothing about crawling.  So, IMHO on the
above.

RE: Hadoop Cluster Size Scalability Numbers?

Reply via email to