Hi, I also could not get GRES working (and also gpu use case, i.e. node locked consumable resources). Eventually i found some time to dig into the Maui sources. The starting point was this patch:
http://www.clusterresources.com/pipermail/mauiusers/2008-August/003486.html The aforementioned patch was already applied in 3.3.1, but even when i was using -lsoftware instead of -W x="GRES.." it did not work. So i dug further... I realized that the "x=GRES" wasn't even parsed in Maui. There were also some missing code elsewhere. I tried to add the missing parts (see attached patch, i was using Maui 3.3.1). It seems to work(DISCLAIMER: i'm not a Maui developer). There is at least one cavet, the GRES are requested per Task basics. e.g.: imagine that you have 8 core machines with 2 GPUs, if you have application that uses: 1. one CPU core, one GPU: qsub -W x='GRES:gpu@1' #works 2. one CPU core, all two GPUs on one machine: qsub -lnodes=1:ppn=1 -W x='GRES:gpu@2' #works 3. two GPUs on two hosts qsub -lnodes=2:ppn=1 -W x='GRES:gpu@2' #works 4. you want all GPUs and all CPU cores on two hosts qsub -lnodes=2:ppn=8 -W x='GRES:gpu@1' #does not work - because the job request 16 GPUS on two hosts, but actually if you request exclusive access to machines you do need to specify GRES at all... Does anyone knows what is the official process of submitting a patch? I know there is bugzilla at the clusterresources.com but it seems to be dedicated to Torque only. Cheers, On 23 March 2011 17:09, Mike Mosley <[email protected]> wrote: > All, > > I’ve seen several posts regarding what seems to be an inability to get Maui > to work with Generic Resources (GRES). Does anybody have this working and > if so what are the steps you used to configure it? > > My environment: > Torque 2.5.5 > Maui 3.3.1 > > I have a number of compute nodes which have 3 GPUs each. > > I created the following entries in maui.cfg > NODECFG[compute1] GRES=ngpus:3 > NODECFG[compute2] GRES=ngpus:3 > etc. etc. > > I then tried submitting a job along the lines of: > qsub -l nodes=1 -W x=”GRES:ngpus@3” my_script > > The job gets scheduled and executed on a compute node and the ngpus > specification is ignored. By that, I mean that I can take the resource > definition out for compute2 and the job may still get > scheduled there even though I’ve asked for a node with that resourse in my > qsub command. > > Mike > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers > > -- Mariusz
gres-maui.patch
Description: Binary data
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
