On Wed, Apr 27, 2011 at 7:07 PM, Steve Loughran <[email protected]> wrote: > On 26/04/11 14:55, Xiaobo Gu wrote: >> >> Hi, >> >> People say a balanced server configration is as following: >> >> 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks >> >> But we have been used to use storages servers with 24 1T SATA Disks, >> we are wondering will Hadoop be CPU bounded if this kind of servers >> are used. Does anybody have experiences with hadoop running on servers >> with so many disks. > > Some of the new clusters are running one or two 6 core CPUs with 12*2TB 3.5" > HDDs for storage, as this gives maximum storage density (it fits in a 1U). > The exact ratio of CPU:RAM:disk depends on the application. > > What you get with the big servers is > -more probability of local access > -great IO bandwidth, especially if you set up the mapred.temp.dir value to > include all the drives.
Do you mean mapred.local.dir? Another question: how mapreduce write to mapred.local.dir, in round-rbion ? Does mixing mapred.local.dir and dfs.data.dir is common practice? > -less servers means less network ports on the switches, so you can save > some money in the network fabric, and in time/effort cabling everything up. > > What do you lose? > -in a small cluster, loss of a single machine matters > -in a large cluster, loss of a single machine can generate up to 24TB of > replication traffic (more once 3TB HDDs become affordable) > -in a large cluster, loss of a rack (or switch) can generate a very large > amount of traffic. > > If you were building a large (muti PB) cluster, this design is good for > storage density -you could get a petabyte in a couple of racks, though the > replication costs of a Top of Rack switch failure might push you towards > 2xToR switches and bonded NICs, which introduce a whole new set of problems. > > For smaller installations? I don't know. > > -Steve >
