-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 What happens if you just do a simple qsub like this:
qsub -I -l nodes=fu48core.esl ? We define features for every node. I think the reason you might be having trouble is because from: pbs/server_priv/nodes bh001 np=4 compute Then set a queue attribute of: resources.default_neednodes = compute for the particular queue. - From there, Maui will query torque, and know that the node bh001 has a compute feature, so when you submit a job to a queue, it should be mapped to bh001 via the node features. I'm actually not sure if you can submit jobs and have them run on nodes w/o defining node features. On Jul 20, 2011, at 6:59 PM, Caleb Phillips wrote: > Hello all: > > I'm running torque 2.3.6 (packaged with Ubuntu 10.10) and maui 3.3.1. > I'm having an issue where submitted jobs sit in the queue indefinitely. > This was occurring with pbs_sched, so I installed maui hoping it would > fix the problem. With maui, I have more information about the problem, > but no resolution. I've spent several hours searching the torqueusers > and mauiusers mailing lists, and reading the manuals, to no avail. I > hope you can help... > > As far as I can tell, maui is complaining that there are not sufficient > "feasible procs" for jobs to run because of a lack of "features". My > nodes have no features enabled, and I'm not requesting any with my jobs. > Yet, the jobs show up with "[1][ppn=1]" in the feature list. I don't > know where these features are coming from or how to unset them, or if > that's really the source of the problem (it's simply my best guess). Any > ideas? > > Here's more information on my setup and how I reproduce the problem: > > I have one node (currently online). It has 48 processors: > >> caleb@torqueserver:~$ qnodes >> fu48core.esl >> state = free >> np = 48 >> ntype = cluster >> status = opsys=linux,uname=Linux 48core 2.6.32-25-server #45-Ubuntu SMP >> Sat Oct 16 20:06:58 UTC 2010 x86_64,sessions=2834 5874 12296 13555 19465 >> 17575,nsessions=6,nusers=3,idletime=2308,totmem=82007668kb,availmem=73380372kb,physmem=82007668kb,ncpus=48,loadave=2.19,netload=24944834533,state=free,jobs=,varattr=,rectime=1311202191 > > It's free and presumably happy: > >> caleb@torqueserver:/usr/local/maui$ checknode fu48core >> >> checking node fu48core.esl >> >> State: Idle (in current state for 5:15:40) >> Configured Resources: PROCS: 48 MEM: 78G SWAP: 78G DISK: 1M >> Utilized Resources: SWAP: 8426M >> Dedicated Resources: [NONE] >> Opsys: linux Arch: [NONE] >> Speed: 1.00 Load: 2.240 >> Network: [DEFAULT] >> Features: [NONE] >> Attributes: [Batch] >> Classes: [batch 48:48][amplhack 48:48][qualnet 48:48][lightweight 48:48] >> >> Total Time: 6:19:49 Up: 6:19:49 (100.00%) Active: 00:00:00 (0.00%) >> >> Reservations: >> NOTE: no reservations on node > > The batch queue is empty. If I submit a very basic job (I've tried more > complicated jobs too, with specific resource requests), it gets deferred > immediately: > >> caleb@torqueserver:/usr/local/maui$ echo "sleep 30" | qsub >> 25.torqueserver.esl >> caleb@torqueserver:/usr/local/maui$ checkjob 25 >> checking job 25 >> >> State: Idle EState: Deferred >> Creds: user:caleb group:abelian class:batch qos:DEFAULT >> WallTime: 00:00:00 of 1:00:00:00 >> SubmitTime: Wed Jul 20 16:52:37 >> (Time Queued Total: 00:00:31 Eligible: 00:00:00) >> >> Total Tasks: 1 >> >> Req[0] TaskCount: 1 Partition: ALL >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [1][ppn=1] >> NodeCount: 1 >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 0 >> PartitionMask: [ALL] >> Flags: RESTARTABLE >> >> job is deferred. Reason: NoResources (cannot create reservation for job >> '25' (intital reservation attempt) >> ) >> Holds: Defer (hold reason: NoResources) >> PE: 1.00 StartPriority: 1 >> cannot select job 25 for partition DEFAULT (job hold active) > > If I release the job, I can see that maui's complaining about a lack of > feasible procs due to unavailable features: > >> caleb@torqueserver:/usr/local/maui$ releasehold 25 >> >> job holds adjusted >> caleb@torqueserver:/usr/local/maui$ checkjob -v 25 >> >> >> checking job 25 (RM job '25.torqueserver.esl') >> >> State: Idle >> Creds: user:caleb group:abelian class:batch qos:DEFAULT >> WallTime: 00:00:00 of 1:00:00:00 >> SubmitTime: Wed Jul 20 16:52:37 >> (Time Queued Total: 00:04:39 Eligible: 00:02:35) >> >> Total Tasks: 1 >> >> Req[0] TaskCount: 1 Partition: ALL >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [1][ppn=1] >> Exec: '' ExecSize: 0 ImageSize: 0 >> Dedicated Resources Per Task: PROCS: 1 >> NodeAccess: SHARED >> NodeCount: 1 >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 0 >> PartitionMask: [ALL] >> Flags: RESTARTABLE >> >> Messages: cannot create reservation for job '25' (intital reservation >> attempt) >> >> PE: 1.00 StartPriority: 2 >> job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 >> of 1 procs found) >> idle procs: 48 feasible procs: 0 >> >> Rejection Reasons: [Features : 1] >> >> Detailed Node Availability Information: >> >> fu48core.esl rejected : Features > > There are no error messages in the torque server_log, maui's log file, > or the node's mom_log. In fact, my node never even sees the job since > maui never decides to run it. > > Any help you can provide would be extremely helpful. Thanks! > > -- > Caleb Phillips, Ph.D. Candidate > Computer Science Department > University of Colorado, Boulder > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJOKJqRAAoJENS19LGOpgqKRmYH+wUgAcq1B4If6qSE+EWT0MEc uWp/caUMzy7FO2GYuVaAWtCVPBkUCo6QWlu97L+vQlpSa88yhEYwqZdKE+4ygFs4 gycahUdZeOAYukvqj+cRaUkOtK+DKaLio+Ehh9NyMOfR18w4y+iAbN451UYLESXd Ib+Pn2m7C7BN9rdejVyX0Cx/MjflXxXmnXfvGH1QjD4wtWqBBr3KVjZu+qw+VmQw XTu8YIqQxWp0+ITa+rBOhgnWVjgRy1qFM4rLqxJIVPytQKjp4I2zA34l6OX+6SRN BCbKeUoumqUE1RstuScp8O4HKGqL6GKHpjZAOmvX4JNmeewEWbZMW9eqbp0GQ88= =ZRP5 -----END PGP SIGNATURE----- _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
