On Wed, Apr 27, 2011 at 7:07 PM, Steve Loughran <[email protected]> wrote:
> On 26/04/11 14:55, Xiaobo Gu wrote:
>>
>> Hi,
>>
>>  People say a balanced server configration is as following:
>>
>>  2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
>>
>> But we have been used to use storages servers with 24 1T SATA Disks,
>> we are wondering will Hadoop be CPU bounded if this kind of servers
>> are used. Does anybody have experiences with hadoop running on servers
>> with so many disks.
>
> Some of the new clusters are running one or two 6 core CPUs with 12*2TB 3.5"
> HDDs for storage, as this gives maximum storage density (it fits in a 1U).
> The exact ratio of CPU:RAM:disk depends on the application.
>
> What you get with the big servers is
>  -more probability of local access
>  -great IO bandwidth, especially if you set up the mapred.temp.dir value to
> include all the drives.

Do you mean mapred.local.dir? Another question: how mapreduce write to
mapred.local.dir, in round-rbion ?
Does mixing mapred.local.dir and dfs.data.dir  is common practice?


>  -less servers means less network ports on the switches, so you can save
> some money in the network fabric, and in time/effort cabling everything up.
>
> What do you lose?
>  -in a small cluster, loss of a single machine matters
>  -in a large cluster, loss of a single machine can generate up to 24TB of
> replication traffic (more once 3TB HDDs become affordable)
>  -in a large cluster, loss of a rack (or switch) can generate a very large
> amount of traffic.
>
> If you were building a large (muti PB) cluster, this design is good for
> storage density -you could get a petabyte in a couple of racks, though the
> replication costs of a Top of Rack switch failure might push you towards
> 2xToR switches and bonded NICs, which introduce a whole new set of problems.
>
> For smaller installations? I don't know.
>
> -Steve
>

Reply via email to