On 26/04/11 14:55, Xiaobo Gu wrote:
Hi,
People say a balanced server configration is as following:
2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
But we have been used to use storages servers with 24 1T SATA Disks,
we are wondering will Hadoop be CPU bounded if this kind of servers
are used. Does anybody have experiences with hadoop running on servers
with so many disks.
Some of the new clusters are running one or two 6 core CPUs with 12*2TB
3.5" HDDs for storage, as this gives maximum storage density (it fits in
a 1U). The exact ratio of CPU:RAM:disk depends on the application.
What you get with the big servers is
-more probability of local access
-great IO bandwidth, especially if you set up the mapred.temp.dir
value to include all the drives.
-less servers means less network ports on the switches, so you can
save some money in the network fabric, and in time/effort cabling
everything up.
What do you lose?
-in a small cluster, loss of a single machine matters
-in a large cluster, loss of a single machine can generate up to 24TB
of replication traffic (more once 3TB HDDs become affordable)
-in a large cluster, loss of a rack (or switch) can generate a very
large amount of traffic.
If you were building a large (muti PB) cluster, this design is good for
storage density -you could get a petabyte in a couple of racks, though
the replication costs of a Top of Rack switch failure might push you
towards 2xToR switches and bonded NICs, which introduce a whole new set
of problems.
For smaller installations? I don't know.
-Steve