Re: [Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

Brian R. Smith Mon, 19 Mar 2007 12:03:26 -0800

Brian,

(Threads like this can get confusing, which Brian? :)


Brian D. Ropers-Huilman wrote:

Brian,

There are usually three or four categories of storage:

1) /home - small, just enough to keep source files and compile code
2) /scratch/local - distributed disks within a cluster for local
writing (think Gaussian)
3) /scratch/global - a high-performance (and higher cost) parallel
file system accessible by all nodes
4) /archive - a very large pool of spinning disks which receives data
from /scratch/global when a run (or set of consecutive runs) is
"complete." The idea is to clear off the expensive parallel system for
other run-time use, but that you still want to hold the data for some
future need.

We have 1-3. 4 is the equivalent to our 1 but we make it the user'sresponsibility to move their data. I like your idea though.


I would keep your /home and /scratch/global separate.

I've thought about this and it makes sense on a couple of levels. a) alot of data that gets written to /scratch/global is fairly transient innature. Some results a user might keep, many others they discard. If/home == /scratch/global, then chances are our backup tapes will belittered with data that nobody wants. b) Not a single point offailure. However, there are some advantages, I think, if you can mergethe two: a) You only have one disk to administer and all of your effortsfor fault tolerance, monitoring, and maintenance can be focused on thatdevice. When you're a one-man-cluster-army, sysadmining andmaintaining, testing, developing, and deploying codes, you learn toappreciate consolidation of this nature. Sure, it may appear a singlepoint of failure, but the plan also includes an offsite backup volumewhich can be vlan'ed into the cluster's network. If the local arraydies, the outside array can take its place (albeit, with significantlyreduced performance) until repairs can be made to the main array. Theoffsite array should also be able to be physically moved (fairlyquickly) to our datacenter as a drop-in replacement.


The /scratch/global solution you pick will very much depend on how you
want it connected to your clusters. By definition (of your cluster
suite) you cannot have a system that relies on IB as not all of your
systems have IB. This leaves GbE as the only global means of
connection. If at all possible, I would dedicate a GbE interface on
all nodes who access /scratch/global.

Yes, this is unfortunate. But fortunately, very few problems running onthe current system need disk access on the level provided by anIB-connected storage device. It would be good to have for later, but wecan pass for now. I agree with the separate networks as well. I'veheard this elsewhere.

Thanks for the advice!

Brian

--
--------------------------------------------------------
+ Brian R. Smith                                       +
+ HPC Systems Analyst & Programmer                     +
+ Research Computing, University of South Florida      +
+ 4202 E. Fowler Ave. LIB618                           +
+ Office Phone: 1 (813) 974-1467                       +
+ Mobile Phone: 1 (813) 230-3441                       +
+ Organization URL: http://rc.usf.edu                  +
--------------------------------------------------------

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

Reply via email to