Scott> I'm looking at building a small cluster of disk-less 1 or 2U Scott> servers and will probably use CentOS 5.
Disk is cheap, so why not use it? Also, it means that on bootup of the cluster, they all don't hit the tftp/dhcpd/nfs server to pull over the OS and daemons at the same time. Scott> Since these machines will not have any hard drives, what would Scott> be the minimum amount of RAM I'd need? The more the better. It depends on your budget and which systems you are getting too. I assume you are gettting 64bit motherboard/cpu combos? Something that can handle 8gb or more would be nice, esp if it's got a flat address space so that access to memory over 4gb isn't slowed down alot. Opterons are good for this. Probably so too are the newer Core2Duos from Intel. But be carefull that the motherboard has an IOMMU to properly map memory around. Scott> Also, if using Rocks or something similar, will that help Scott> cluster the RAM together so 4 servers x 4 GB RAM each = 16 GB Scott> available? Generally, no. You need a fast interconnect and the NUMA (Non Uniform Memory Access) aware kernel and applications to make this work. It's a quite specialized subset of compute clusters. You also want a fast interconnect between nodes, like Myrinet, which has high bandwidth and low latency. Once you have to access off-node memory, you take a huge performance hit. So unless you have a large problem which can de-compose easily into subsets which can run independently on seperate nodes, then you won't be able to efficiently pool memory on seperate nodes. It's a great dream, but in practice it doesn't work. Just put a bunch of memory on each node (it's cheap these days...) and be done with it. Oh yeah, if you can, get ECC memory. There's nothing worse than having a three day compute job crap out because of a single bit error on the memory which could have been caught with Error Correction. Scott> Some applications might be CPU intensive, others might be RAM Scott> intensive, so I need to play that balance, too. Are the user's writing their own code to run on the cluster? If so, then I'd probably just run a basic job scheduler like Rocks or LSF (non-free) or sun grid or one of the other ones out there and have a single master node which users login to submit jobs and to control them, then the rest of the nodes are setup on their own private subnet and are locked down so that they don't have users running interactive stuff on them if possible. But hey, you need to do some more research and answer some general questions on *how* your users are expecting to use these resources before you actually buy and build anything. Good luck, John _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
