Do you have any tricks on diagnosing jobs that are held on NoResources? I have a job that keeps being put into this state, but I can't see what resources are missing. It is a 2001 core job but showq shows that there are 3164 cores on the system. I had the user drop memory requirements to see if it would go through, and as far as I can see, nothing else was requested.
checking job 141698 State: Idle EState: Deferred Creds: user:user group:group class:default qos:dedicated WallTime: 00:00:00 of 6:00:00:00 SubmitTime: Mon Aug 15 16:00:56 (Time Queued Total: 17:31:43 Eligible: 00:22:39) Total Tasks: 2001 Req[0] TaskCount: 2001 Partition: ALL Network: [NONE] Memory >= 1024M Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [default] Dedicated Resources Per Task: PROCS: 1 MEM: 1024M IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE PREEMPTEE DEDICATEDNODE Attr: PREEMPTEE job is deferred. Reason: NoResources (cannot create reservation for job '141698' (intital reservation attempt) ) Holds: Batch Defer (hold reason: NoResources) PE: 2001.00 StartPriority: 22 cannot select job 141698 for partition DEFAULT (job hold active) These are the logs after releasing the hold: 08/16 09:30:05 MQueueScheduleIJobs(Q,DEFAULT) 08/16 09:30:05 INFO: 2988 feasible tasks found for job 141698:0 in partition DEFAULT (2001 Needed) 08/16 09:30:05 MJobPReserve(141698,DEFAULT,ResCount,ResCountRej) 08/16 09:30:05 INFO: 2988 feasible tasks found for job 141698:0 in partition DEFAULT (2001 Needed) 08/16 09:30:05 ALERT: job 141698 cannot run in any partition 08/16 09:30:05 ALERT: cannot create new reservation for job 141698 (shape[1] 2001) 08/16 09:30:05 ALERT: cannot create new reservation for job 141698 08/16 09:30:05 ALERT: job '141698' cannot run (deferring job for 300 seconds) 08/16 09:30:05 INFO: batch hold placed on job '141698', reason: 'NoResources' 08/16 09:30:05 MSysRegEvent(JOBHOLD: batch hold placed on job '141698'. defercount: 33 reason: 'NoResources',0,0,1) 08/16 09:30:05 MSysLaunchAction(ASList,1) 08/16 09:30:05 WARNING: cannot reserve priority job '141698' Active Jobs------ ------------------ 08/16 09:30:05 INFO: resources available after scheduling: N: 185 P: 1908 This was submitted to the default queue which has qos of dedicated: QOSCFG[dedicated] QFLAGS=PREEMPTEE:DEDICATED CLASSCFG[default] QDEF=dedicated create queue default set queue default queue_type = Execution set queue default resources_default.pmem = 1500mb set queue default enabled = True set queue default started = True _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
