Re: [Beowulf] statless compute nodes

Christopher Samuel Tue, 09 Jun 2015 20:10:48 -0700

On 02/06/15 21:11, Peter Kjellstrom wrote:

> Or not even that. We extract an image onto a local drive on each boot.
> Nodes have no state in this case (the local rootfs essentially just a
> cache of the network provided image).


Basically there are lots of ways to exfoliate this particular feline,
and it all depends on what your reasons are for doing it.

We went statelite (in xCAT terminology) where the nodes boot a RAMdisk
with some shared state via NFS bind mounts to get away from having hard
disks in nodes to remove them as a source of failure.

With that and using cgroups in Slurm to constrain jobs to the memory
they've requested (and not overcommitting memory) it's worked out pretty
well, the next on the list for us is to investigate this Slurm plugin
that uses the kernels namespace support to replace the tiny /tmp in the
RAMdisk with a per-job scratch directory on GPFS:

https://github.com/hpc2n/spank-private-tmp

cheers!
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] statless compute nodes

Reply via email to