Re: Cluster hardware question

Steve Loughran Wed, 27 Apr 2011 04:08:46 -0700

On 26/04/11 14:55, Xiaobo Gu wrote:

Hi,


  People say a balanced server configration is as following:

  2 4 Core CPU, 24G RAM, 4 1TB SATA Disks

But we have been used to use storages servers with 24 1T SATA Disks,
we are wondering will Hadoop be CPU bounded if this kind of servers
are used. Does anybody have experiences with hadoop running on servers
with so many disks.

Some of the new clusters are running one or two 6 core CPUs with 12*2TB3.5" HDDs for storage, as this gives maximum storage density (it fits ina 1U). The exact ratio of CPU:RAM:disk depends on the application.


What you get with the big servers is
 -more probability of local access

-great IO bandwidth, especially if you set up the mapred.temp.dirvalue to include all the drives.-less servers means less network ports on the switches, so you cansave some money in the network fabric, and in time/effort cablingeverything up.


What do you lose?
 -in a small cluster, loss of a single machine matters

-in a large cluster, loss of a single machine can generate up to 24TBof replication traffic (more once 3TB HDDs become affordable)-in a large cluster, loss of a rack (or switch) can generate a verylarge amount of traffic.

If you were building a large (muti PB) cluster, this design is good forstorage density -you could get a petabyte in a couple of racks, thoughthe replication costs of a Top of Rack switch failure might push youtowards 2xToR switches and bonded NICs, which introduce a whole new setof problems.


For smaller installations? I don't know.

-Steve

Re: Cluster hardware question

Reply via email to