On 11/4/08 2:16 AM, "Arijit Mukherjee" <[EMAIL PROTECTED]>
wrote:
> * 1-5 TB external storage
>
> I'm curious to find out what sort of specs do people use normally. Is
> the external storage essential or will the individual disks on each node
> be sufficient? Why would you need an external storage in a hadoop
> cluster?
The big reason for the external storage is two fold:
A) Provide shared home directory (especially for the HDFS user so that it is
easy to use the start scripts that call ssh)
B) An off-machine copy of the fsimage and edits file as used by the name
node. This way if the name node goes belly up, you'll have an always
up-to-date backup to recover.
> How can I find out what other projects on hadoop are using?
Slide 12 of the Apachecon presentation I did earlier this year talks
about what Yahoo!'s typical node looks like. For a small 5 node cluster,
your hardware specs seem fine to me.
An 8GB namenode for 4 data nodes (or maybe even running nn on the same
machine as a data node if memory size of jobs is kept in check) should be
a-ok, even if you double the storage. You're likely going to run out of
disk space before the name node starts swapping.