John,
John Hearns wrote:
Brian R. Smith wrote:
Hey list,
1. Proprietary parallel storage systems (like Panasas, etc.): It
provides the per-node bandwidth, aggregate bandwidth, caching
mechanisms, fault-tolerance, and redundancy that we require (plus
having a vendor offering 24x7x365 support & 24 hour turnover is quite
a breath of fresh air for us). Price point is a little high for the
amount of storage that we will get though, little more than doubling
our current overall capacity. As far as I can tell, I can use this
device as a permanent data store (like /home) and also as the user's
scratch space so that there is only a single point for all data needs
across the cluster. It does, however, require the installation of
vendor kernel modules which do often add overhead to system
administration (as they need to be compiled, linked, and tested
before every kernel update).
If you like Panasas, go with them.
The kernel module thing isn't all that a big deal - they are quite
willing to 'cook' the modules for you.
but YMMV
After some discussion, it came to my attention that it might not be the
best solution. I will probably still need to fork over for a /home
solution anyway. I'll contact them anyway just to be sure.
Our final problem is a relatively simple one though I am definitely a
newbie to the H.A. world. Under this consolidation plan, we will
have only one point of entry to this cluster and hence a single point
of failure. Have any beowulfers had experience with deploying
clusters with redundant head nodes in a pseudo-H.A. fashion
(heartbeat monitoring, fail-over, etc.) and what experiences have you
had in
adapting your resource manager to this task? Would it simply be more
feasible to move the resource manager to another machine at this
point (and have both headnodes act as submit and administrative
clients)? My current plan is unfortunately light on the details of
handling SGE in such an environment. It includes purchasing two
identical 1U boxes (with good support contracts). They will monitor
each other for availability and the goal is to have the spare take
over if the master fails. While the spare is not in use, I was
planning on dispatching jobs to it.
I have constructed several clusters using HA.
I believe Joe Landman has also - as you are in the States why not give
some thought to contacting Scalable and getting them to do some more
detailed designs for you?
For HA clusters, I have implemented several clusters using Linux-HA
and heartbeat. This is an active/passive setup, with a primary and a
backup head node. On failover, the backup head node starts up cluster
services.
Failing over SGE is (relatively) easy - the main part is making sure
that the cluster spool directory is on shared storage.
And mounting that share storage on one machine or the other :-)
Yeah, they have good failover support and we are already running the
berkely database (I was planning on this happening one day) so moving
over to master/shadow configuration should be easy. Shared storage will
be whatever we end up purchasing for that purpose so it will be
available. I've always run SGE over an NFS share.
The harder part is failing over NFS - again I've done it.
I gather there is a wrinkle or two with NFS v4 on Linux-HA type systems.
Shouldn't be a problem. NFS will be served from a dedicated host and
will have an off-site mirror that can take its place over vlan Not as
fast, but the data is there and the line is dedicated. I'll have to
work on other failover plans (perhaps mirroring in the same room, tapes
for "off-site"-edness?)
The second way to do this would be to look at using shared storage,
and using the Gridengine queue master failover mechanism. This is a
different approach, in that you have two machines running, using
either a NAS type storage server or Panasas/Lustre. The SGE spool
directory is on this, and the SGE qmaster will start on the second
machine if the first fails to answer its heartbeat.
ps. 1U boxes? Think something a bit bigger - with hot swap PSUs.
You also might have to fit a second network card for your HA heartbeat
link (link plural - you need two links) plus a SCSI card, so think
slightly bigger boxes for the two head nodes.
Yes, good recommendations (no SCSI needed, thankfully). Those are
definitely a couple factors I forgot to consider when opting for 1U
You can spec 1U nodes for interactive login/compile/job submission
nodes. Maybe you could run a DNS round robin type load balancer for
redundancy on these boxes - they should all be similar, and if one
stops working then ho-hum.
pps. "when the spare is not in use dispatching jobs to it"
Actually, we also do a cold failover setup which is just like that,
and the backup node is used for running jobs when it is idle.
Thanks a lot for the help!
-Brian
--
--------------------------------------------------------
+ Brian R. Smith +
+ HPC Systems Analyst & Programmer +
+ Research Computing, University of South Florida +
+ 4202 E. Fowler Ave. LIB618 +
+ Office Phone: 1 (813) 974-1467 +
+ Mobile Phone: 1 (813) 230-3441 +
+ Organization URL: http://rc.usf.edu +
--------------------------------------------------------
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf