Hi, Our cluster has recently got far more complex, and we have put some interim scheduling policies in place until we can work out something better. I wonder if there are cluster administrators in the community who could advice us, please. I suspect others have similar set ups in place.
Basically we have one large torque/maui controlled cluster consisting of... 1. Single core nodes -- all departments 2. Dual core nodes -- all departments 3. Departmental nodes -- 4 nodes for chemistry (dual cores), 8 nodes for eScience (dual cores), 5 single core nodes for Sound/Vibration. The nodes in (1) are older, and all users can use them by default and access is trivial. For the new nodes (2 and 3) we have devised a simple scheme to control access based on switch boundaries. For example, for nodes in (2), we have... NODECFG[purple301] FEATURES=switch10 ... NODECFG[purple332] FEATURES=switch11 .. Switches 10, and 11 aren't defined in the maui NODESETLIST, and so users must specify the appropriate switch(es) on their qsub command. Above all we want to ensure that jobs don't ever grab a mix of nodes from (1), and (2). Clunky, but works. For nodes in (3), again we have followed the same "switch" idea however have also defined a standing reservation to limit user access. Also, of course, users can ensure that their jobs can spill over into the main facility by doing something like: qsub -W x=NODESET:ONEOF:FEATURE:switch10:switch11:escience ... Above all I think this scheme is clunky, and could be improved upon(?) -- we are writing a script to hide the details, however. Could any one with more experience of setting large systems please advise us by suggesting possible set ups based on queues, partitions, etc. An interesting question comes to mind...in a torque/maui system it is possible for queued jobs to migrate from one queue to another if resources are busy Thanks -- David. _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
