Sorry, I just had a look at my original post again. The description missed a '+' sign there but in my actual testing I have a '+' sign. Therefore, qsub -l nodes=1:ppn=12+1:ppn=1 (works) while qsub -l nodes=3:ppn=12+1:ppn=1 (does not work, job goes to idle)
Weird stuff. May I know if you guys encounter this? Regards, Marvin On Fri, Mar 25, 2011 at 10:46 AM, Marvin Novaglobal < [email protected]> wrote: > Hi Peter, > It doesn't work for my setup. I meant it only applies to nodes=3 and > nodes=5 so far. We don't have enough resources to test on nodes=7. So again, > qsub -l nodes=1:ppn=12+1:ppn=1 will work but > qsub -l nodes=3:ppn=12+1:ppn=1 will not work > May I know which version of Maui and Torque you are using? Your Maui > and Torque's config also please. > > > > Regards, > Marvin > > > On Fri, Mar 25, 2011 at 12:20 AM, Peter Michael Crosta < > [email protected]> wrote: > >> Hi Marvin, >> >> I have gotten multiple resource requests to work by using the "+" sign. >> Have you tried >> >> qsub -l nodes=3:ppn=12+1:ppn=1 ? >> >> Best, >> Peter >> >> >> On Thu, 24 Mar 2011, Marvin Novaglobal wrote: >> >> Hi, On my setup, >>> $ qsub -l nodes=1:ppn=12:1:ppn=1 (works) >>> $ qsub -l nodes=2:ppn=12:1:ppn=1 (works) >>> $ qsub -l nodes=3:ppn=12:1:ppn=1 (job goes to idle and never get >>> executed) >>> $ qsub -l nodes=4:ppn=12:1:ppn=1 (works) >>> $ qsub -l nodes=5:ppn=12:1:ppn=1 (job goes to idle and never get >>> executed) >>> >>> <Maui.cfg> >>> ... >>> ENABLEMULTINODEJOBS[0] TRUE >>> ENABLEMULTIREQJOBS[0] TRUE >>> JOBNODEMATCHPOLICY[0] EXACTNODE >>> NODEALLOCATIONPOLICY[0] MINRESOURCE >>> >>> >>> <Torque.cfg> >>> set server scheduling = True >>> set server acl_hosts = aquarius.local >>> set server managers = torque@aquarius >>> set server operators = torque@aquarius >>> set server default_queue = DEFAULT >>> set server log_events = 511 >>> set server mail_from = adm >>> set server resources_available.nodect = 2048 >>> set server scheduler_iteration = 600 >>> set server node_check_rate = 150 >>> set server tcp_timeout = 6 >>> set server mom_job_sync = True >>> set server keep_completed = 300 >>> set server next_job_number = 377 >>> >>> <maui.log> >>> 03/24 20:23:48 MResDestroy(377) >>> 03/24 20:23:48 MResChargeAllocation(377,2) >>> 03/24 20:23:48 >>> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE) >>> 03/24 20:23:48 INFO: total jobs selected in partition ALL: 1/1 >>> 03/24 20:23:48 >>> >>> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE) >>> 03/24 20:23:48 INFO: total jobs selected in partition DEFAULT: 1/1 >>> 03/24 20:23:48 MQueueScheduleIJobs(Q,DEFAULT) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:0 in >>> partition >>> DEFAULT (36 Needed) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:1 in >>> partition >>> DEFAULT (1 Needed) >>> 03/24 20:23:48 ALERT: inadequate tasks to allocate to job 377:1 (0 < >>> 1) >>> 03/24 20:23:48 ERROR: cannot allocate nodes to job '377' in partition >>> DEFAULT >>> 03/24 20:23:48 MJobPReserve(377,DEFAULT,ResCount,ResCountRej) >>> 03/24 20:23:48 MJobReserve(377,Priority) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:0 in >>> partition >>> DEFAULT (36 Needed) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:1 in >>> partition >>> DEFAULT (1 Needed) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:0 in >>> partition >>> DEFAULT (36 Needed) >>> 03/24 20:23:48 INFO: 72 feasible tasks found for job 377:1 in >>> partition >>> DEFAULT (1 Needed) >>> 03/24 20:23:48 INFO: located resources for 36 tasks (144) in best >>> partition DEFAULT for job 377 at time 00:00:01 >>> 03/24 20:23:48 INFO: tasks located for job 377: 37 of 36 required >>> (144 >>> feasible) >>> 03/24 20:23:48 MResJCreate(377,MNodeList,00:00:01,Priority,Res) >>> 03/24 20:23:48 INFO: job '377' reserved 36 tasks (partition DEFAULT) >>> to >>> start in 00:00:01 on Thu Mar 24 20:23:49 >>> (WC: 2592000) >>> >>> <pbs_server.log> >>> 03/24/2011 20:23:17;0100;PBS_Server;Job;377.aquarius;enqueuing into >>> DEFAULT, >>> state 1 hop 1 >>> 03/24/2011 20:23:17;0008;PBS_Server;Job;377.aquarius;Job Queued at >>> request >>> of torque@aquarius, owner = torque@aquarius, job name = parallel.sh, >>> queue = >>> DEFAULT >>> 03/24/2011 20:23:17;0040;PBS_Server;Svr;aquarius;Scheduler was sent the >>> command new >>> >>> >>> Anyone encounter problem with multiple job requests? >>> >>> >>> Regards, >>> Marvin >>> >>> >>> >
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
