Hi, sorry for the late reply.
My config is as follows, four compute nodes with np=2, two have the "gige"
feature and two have "ib". You'll find the config files below.
Submit three jobs like this:
echo sleep 1000 | qsub -lnodes=2:ppn=2:gige,walltime=3000
wait for the first job to start, the two others should be queued.
Then submit four jobs like this
echo sleep 1000 | qsub -lnodes=1,walltime=3000
What you should see then is three jobs starting and one job ending up queued
leaving one job slot un-utilized:
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
189 royd Running 4 00:43:43 Sat Mar 28 23:47:52
192 royd Running 1 00:45:21 Sat Mar 28 23:49:30
193 royd Running 1 00:45:52 Sat Mar 28 23:50:01
194 royd Running 1 00:45:52 Sat Mar 28 23:50:01
4 Active Jobs 7 of 8 Processors Active (87.50%)
4 of 4 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
190 royd Idle 4 00:50:00 Sat Mar 28 23:47:52
191 royd Idle 4 00:50:00 Sat Mar 28 23:47:53
195 royd Idle 1 00:50:00 Sat Mar 28 23:49:31
3 Idle Jobs
This illustrates the behaviour we see on our production cluster without the
maui patch I submitted earlier.
Here is the nodes file from my 4 node test cluster:
compute-0-0 np=2 ib
compute-0-1 np=2 ib
compute-0-2 np=2 gige
compute-0-3 np=2 gige
and here is the maui.cfg
RMPOLLINTERVAL 00:00:30
JOBAGGREGATIONTIME 00:00:30
SERVERHOST hpc2.cc.uit.no
SERVERPORT 42559
SERVERMODE NORMAL
RMCFG[base] TYPE=PBS
ADMIN1 maui root
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEACCESSPOLICY SINGLEUSER
And here is the torque config, aka the output from qmgr -c " print server"
$ qmgr -c "p s"
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts = hpc2.cc.uit.no
set server managers = [email protected]
set server managers += [email protected]
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 196
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers