Re: [Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

Brian R. Smith Mon, 19 Mar 2007 12:13:48 -0800

John,

John Hearns wrote:

Brian R. Smith wrote:
Hey list,
1. Proprietary parallel storage systems (like Panasas, etc.): Itprovides the per-node bandwidth, aggregate bandwidth, cachingmechanisms, fault-tolerance, and redundancy that we require (plushaving a vendor offering 24x7x365 support & 24 hour turnover is quitea breath of fresh air for us). Price point is a little high for theamount of storage that we will get though, little more than doublingour current overall capacity. As far as I can tell, I can use thisdevice as a permanent data store (like /home) and also as the user'sscratch space so that there is only a single point for all data needsacross the cluster. It does, however, require the installation ofvendor kernel modules which do often add overhead to systemadministration (as they need to be compiled, linked, and testedbefore every kernel update).
If you like Panasas, go with them.
The kernel module thing isn't all that a big deal - they are quitewilling to 'cook' the modules for you.
but YMMV

After some discussion, it came to my attention that it might not be thebest solution. I will probably still need to fork over for a /homesolution anyway. I'll contact them anyway just to be sure.

Our final problem is a relatively simple one though I am definitely anewbie to the H.A. world. Under this consolidation plan, we willhave only one point of entry to this cluster and hence a single pointof failure. Have any beowulfers had experience with deployingclusters with redundant head nodes in a pseudo-H.A. fashion(heartbeat monitoring, fail-over, etc.) and what experiences have youhad inadapting your resource manager to this task? Would it simply be morefeasible to move the resource manager to another machine at thispoint (and have both headnodes act as submit and administrativeclients)? My current plan is unfortunately light on the details ofhandling SGE in such an environment. It includes purchasing twoidentical 1U boxes (with good support contracts). They will monitoreach other for availability and the goal is to have the spare takeover if the master fails. While the spare is not in use, I wasplanning on dispatching jobs to it.
I have constructed several clusters using HA.
I believe Joe Landman has also - as you are in the States why not givesome thought to contacting Scalable and getting them to do some moredetailed designs for you?
For HA clusters, I have implemented several clusters using Linux-HAand heartbeat. This is an active/passive setup, with a primary and abackup head node. On failover, the backup head node starts up clusterservices.Failing over SGE is (relatively) easy - the main part is making surethat the cluster spool directory is on shared storage.
And mounting that share storage on one machine or the other :-)

Yeah, they have good failover support and we are already running theberkely database (I was planning on this happening one day) so movingover to master/shadow configuration should be easy. Shared storage willbe whatever we end up purchasing for that purpose so it will beavailable. I've always run SGE over an NFS share.


The harder part is failing over NFS - again I've done it.
I gather there is a wrinkle or two with NFS v4 on Linux-HA type systems.

Shouldn't be a problem. NFS will be served from a dedicated host andwill have an off-site mirror that can take its place over vlan Not asfast, but the data is there and the line is dedicated. I'll have towork on other failover plans (perhaps mirroring in the same room, tapesfor "off-site"-edness?)

The second way to do this would be to look at using shared storage,
and using the Gridengine queue master failover mechanism. This is adifferent approach, in that you have two machines running, usingeither a NAS type storage server or Panasas/Lustre. The SGE spooldirectory is on this, and the SGE qmaster will start on the secondmachine if the first fails to answer its heartbeat.
ps. 1U boxes? Think something a bit bigger - with hot swap PSUs.
You also might have to fit a second network card for your HA heartbeatlink (link plural - you need two links) plus a SCSI card, so thinkslightly bigger boxes for the two head nodes.

Yes, good recommendations (no SCSI needed, thankfully). Those aredefinitely a couple factors I forgot to consider when opting for 1U

You can spec 1U nodes for interactive login/compile/job submissionnodes. Maybe you could run a DNS round robin type load balancer forredundancy on these boxes - they should all be similar, and if onestops working then ho-hum.
pps. "when the spare is not in use dispatching jobs to it"
Actually, we also do a cold failover setup which is just like that,and the backup node is used for running jobs when it is idle.

Thanks  a lot for the help!

-Brian


--
--------------------------------------------------------
+ Brian R. Smith                                       +
+ HPC Systems Analyst & Programmer                     +
+ Research Computing, University of South Florida      +
+ 4202 E. Fowler Ave. LIB618                           +
+ Office Phone: 1 (813) 974-1467                       +
+ Mobile Phone: 1 (813) 230-3441                       +
+ Organization URL: http://rc.usf.edu                  +
--------------------------------------------------------

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

Reply via email to