Hi,

 I also could not get  GRES working (and also gpu use case, i.e. node
locked consumable resources). Eventually i found some time to dig into
the Maui sources. The starting point was this patch:

http://www.clusterresources.com/pipermail/mauiusers/2008-August/003486.html

The aforementioned patch was already applied in 3.3.1, but even when i
was using -lsoftware instead of -W  x="GRES.." it did not work. So i
dug further... I realized that the "x=GRES" wasn't even parsed in
Maui. There were also some missing code elsewhere. I tried to add the
missing parts (see attached patch, i was using Maui 3.3.1). It seems
to work(DISCLAIMER: i'm not a Maui developer). There is at least one
cavet, the GRES are requested per Task basics. e.g.: imagine that you
have 8 core machines with 2 GPUs, if you have application that uses:

1. one CPU core, one GPU:

qsub -W x='GRES:gpu@1' #works

2. one CPU core, all two GPUs on one machine:

qsub -lnodes=1:ppn=1 -W x='GRES:gpu@2' #works

3. two GPUs on two hosts

qsub -lnodes=2:ppn=1 -W x='GRES:gpu@2' #works

4. you want all GPUs and all CPU cores on two hosts
qsub -lnodes=2:ppn=8 -W x='GRES:gpu@1' #does not work - because the
job request 16 GPUS on two hosts, but actually if you request
exclusive access to machines you do need to specify GRES at all...


Does anyone knows what is the official process of submitting a patch?
I know there is bugzilla at the clusterresources.com but it seems to
be dedicated to Torque only.


Cheers,


On 23 March 2011 17:09, Mike Mosley <[email protected]> wrote:
> All,
>
> I’ve seen several posts regarding what seems to be an inability to get Maui
> to work with Generic Resources (GRES).  Does anybody have this working and
> if so what are the steps you used to configure it?
>
> My environment:
> Torque 2.5.5
> Maui 3.3.1
>
> I have a number of compute nodes which have 3 GPUs each.
>
> I created the following entries in maui.cfg
> NODECFG[compute1]               GRES=ngpus:3
> NODECFG[compute2]               GRES=ngpus:3
> etc. etc.
>
> I then tried submitting a job along the lines of:
> qsub  -l nodes=1  -W x=”GRES:ngpus@3”  my_script
>
> The job gets scheduled and executed on a compute node and the ngpus
> specification is ignored.  By that, I mean that I can take the resource
> definition out for compute2 and the job may still get
> scheduled there even though I’ve asked for a node with that resourse in my
> qsub command.
>
> Mike
> _______________________________________________
> mauiusers mailing list
> [email protected]
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>



-- 
Mariusz

Attachment: gres-maui.patch
Description: Binary data

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to