On 11/4/08 2:16 AM, "Arijit Mukherjee" <[EMAIL PROTECTED]>
wrote:

> * 1-5 TB external storage
> 
> I'm curious to find out what sort of specs do people use normally. Is
> the external storage essential or will the individual disks on each node
> be sufficient? Why would you need an external storage in a hadoop
> cluster? 

    The big reason for the external storage is two fold:

A) Provide shared home directory (especially for the HDFS user so that it is
easy to use the start scripts that call ssh)

B) An off-machine copy of the fsimage and edits file as used by the name
node.  This way if the name node goes belly up, you'll have an always
up-to-date backup to recover.

> How can I find out what other projects on hadoop are using?

    Slide 12 of the Apachecon presentation I did earlier this year talks
about what Yahoo!'s typical node looks like.  For a small 5 node cluster,
your hardware specs seem fine to me.

    An 8GB namenode for 4 data nodes (or maybe even running nn on the same
machine as a data node if memory size of jobs is kept in check) should be
a-ok, even if you double the storage.  You're likely going to run out of
disk space before the name node starts swapping.

Reply via email to