On 02/06/15 21:11, Peter Kjellstrom wrote: > Or not even that. We extract an image onto a local drive on each boot. > Nodes have no state in this case (the local rootfs essentially just a > cache of the network provided image).
Basically there are lots of ways to exfoliate this particular feline, and it all depends on what your reasons are for doing it. We went statelite (in xCAT terminology) where the nodes boot a RAMdisk with some shared state via NFS bind mounts to get away from having hard disks in nodes to remove them as a source of failure. With that and using cgroups in Slurm to constrain jobs to the memory they've requested (and not overcommitting memory) it's worked out pretty well, the next on the list for us is to investigate this Slurm plugin that uses the kernels namespace support to replace the tiny /tmp in the RAMdisk with a per-job scratch directory on GPFS: https://github.com/hpc2n/spank-private-tmp cheers! Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
