A little more information - to see if this was a general maui problem or just 
an issue with the torque/maui handoff, I set "JOBNODEMATCHPOLICY EXACTPROC" in 
maui.cfg.  In this case "checkjob -v" reports the ppn=16 nodes as "rejected : 
CPU" as expected.  So it appears to be a problem with communicating the 
information correctly between torque and maui and not the JOBNODEMATCHPOLICY 
parameter itself.  According to the documentation here:

http://www.adaptivecomputing.com/resources/docs/mwm/13.3rmextensions.php

I would expect to be able to use the torque 2.0+ "-l" syntax but I have to 
revert to the torque 1.0 "-W x=" syntax.

% qsub -l nodes=18:ppn=8,nmatchpolicy=exactproc test.pbs
qsub: Job rejected by all possible destinations
% qsub -l nodes=18:ppn=8 -W x=nmatchpolicy:exactproc test.pbs
36896.praesepe.jsc.nasa.gov
%

Any ideas why I can't use the "-l" syntax?  Is the "-l" syntax required with 
torque 2.0 or is the "-W x=" syntax still supposed to work?

On Feb 9, 2011, at 1:00 PM, Vicker, Darby (JSC-EG311) wrote:

> Hello,
> 
> We have a cluster with ppn=8 and ppn=16 nodes.  In general we want jobs 
> requesting ppn=8 nodes to run on the ppn=16 nodes if they are free (i.e. 
> JOBNODEMATCHPOLICY = EXACTNODE).  But we'd like the users to be able to 
> specify JOBNODEMATCHPOLICY = EXACTPROC to constrain their jobs to the ppn=8 
> nodes.  This should be possible but I can't get it working.  If I submit the 
> following script:
> 
> #! /bin/csh -f
> #PBS -S /bin/csh 
> #PBS -N TEST
> #PBS -r n -j oe
> #PBS -l nodes=1:ppn=8
> #PBS -l walltime=2:00:00
> #PBS -W x=nmatchpolicy:exactproc
> 
> cd $PBS_O_WORKDIR
> 
> env > env.txt
> qstat -f $PBS_JOBID > qstat.txt
> 
> 
> 
> 
> it will run on a a ppn=16 node, even though nmatchpolicy is showing up in the 
> torque job attributes and the maui log.  
> 
> 
> 
> % tail -1 qstat.txt
> x = nmatchpolicy:exactproc
> %  grep -i match /usr/local/maui/log/maui.log
> 02/08 16:39:41 MUGetIndex(nmatchpolicy:exactproc,ValList,0)
> % 
> 
> 
> We are running maui 3.2.6p21 and torque 2.3.6.  Any ideas on how to debug 
> this further and correct the problem?
> 
> Thanks,
> Darby
> 

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to