Re: Distributed Clusters

Steve Loughran Thu, 08 Apr 2010 03:19:43 -0700

James Seigel wrote:

I am new to this group, and relatively new to hadoop.

I am looking at building a large cluster.  I was wondering if anyone has any 
best practices for a cluster in the hundreds of nodes?  As well, has anyone had 
experience with a cluster spanning multiple data centers.  Is this a bad 
practice? moderately bad practice?  insane?


got some stuff here
http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment

though my clusters are of short life span and smaller. At that kind ofscale you need to know how to manage datacenters yourself or talk topeople who do (I deny all knowledge, though I will note that in HPconsulting and EDS we do have people who can handle this)

Is it better to build the 1000 node cluster in a single data center?


yes.

Do you back one of these things up to a second data center or a different 1000 
node cluster?


depends on your concerns and where the building is.

-If your facility is in the Bay Area then you want a separate datacentreon a different fault line. If it's in Easter WA or OR then you worrymore about volcanic activity and spec the roof to take 1-2m of volcanicash. Power comes off the big dams which again may go down if there's anearthquake, but otherwise pretty reliable.

-if your worry is about continuous availability, you need differentsites with different (multiple) power suppliers and multiple data feeds,and more to worry about in terms of keeping things in sync. Datatransfer will cost time and money, and for a big enough cluster -1000servers can go up to 6-12 PB of storage, which takes time to sync. Evenwith the CERN LHC experiments data rate of 1 PB/month off the LHC, itwould take 6 months to get the data in to your cluster using a goodprotocol like GridFTP.

-single site would make sync easier, 10GB ethernet will still take awhile but not cost you


Sorry, I am asking crazy questions...I am just wanting to learn the meta issues 
and opportunities with making clusters.

Start small, automate everything, worry about scaling up the managementproblems. Hadoop filestore and JT scales well, but you have to get yourops right. That's everything from BIOS upgrades to log file management.

Re: Distributed Clusters

Reply via email to