[Mauiusers] jobs not starting when avaliable resources

Arnau Bria Tue, 07 Oct 2008 00:42:21 -0700

Hi all,

Some jobs keep on top of IDLE jobs, and don't let the rest start (jobs
from other queues that have nothing to do with these ones).


Looking at them, I see they have resources to start running, but they
don't do: 


[EMAIL PROTECTED] ~]# checkjob -v 672949


checking job 672949 (RM job '672949.pbs02.pic.es')

State: Idle
Creds:  user:iatprd045  group:iatprd  class:ifae  qos:ilhcatlas
WallTime: 00:00:00 of 3:00:00:00
SubmitTime: Tue Oct  7 06:35:52
  (Time Queued  Total: 3:02:20  Eligible: 1:20:42)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [ifae]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
NodeAccess: SHARED
NodeCount: 0


IWD: [NONE]  Executable:  [NONE]
Bypass: 12  StartCount: 0
PartitionMask: [ALL]
SystemQueueTime: Tue Oct  7 08:17:30

PE:  1.00  StartPriority:  82
job can run in partition DEFAULT (17 procs available.  1 procs required)


]# diagnose -j 672949
Name                  State Par Proc QOS     WCLimit R  Min     User    Group  
Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class 
Features

672949                 Idle ALL    1 ilh  3:00:00:00 0    1 iatprd04   iatprd   
     -     1:22:43   [NONE] [NONE] [NONE]    >=0    >=0    NC0    [ifae:1] 
[ifae]


There are some nodes where they coudl start:

td204.pic.es
     state = free
     np = 4
     properties = ifae
--

td203.pic.es
     state = free
     np = 4
     properties = ifae


# checknode td204.pic.es


checking node td204.pic.es

State:   Running  (in current state for 00:00:00)
Expected State:     Idle   SyncDeadline: Sat Oct 24 14:26:40
Configured Resources: PROCS: 4  MEM: 8115M  SWAP: 8115M  DISK: 15G
Utilized   Resources: DISK: 4752M
Dedicated  Resources: PROCS: 3
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       3.000
Network:    [DEFAULT]
Features:   [ifae]
Attributes: [Batch]
Classes:    [long 4:4][medium 4:4][short 4:4][ifae 1:4][gshort 4:4][glong 
4:4][gmedium 4:4][lhcbsl4 4:4][magic 4:4][roman 4:4]

Total Time: 58:11:34:08  Up: 58:10:24:24 (99.92%)  Active: 41:19:36:22 (71.50%)

Reservations:
  Job '672291'(x1)  -6:17:17 -> 2:17:42:43 (3:00:00:00)
  Job '672297'(x1)  -6:15:47 -> 2:17:44:13 (3:00:00:00)
  Job '672924'(x1)  -3:05:22 -> 2:20:54:38 (3:00:00:00)
JobList:  672291,672297,672924


]# diagnose -n td204.pic.es
diagnosing node table (5120 slots)
Name                    State  Procs     Memory         Disk          Swap      
Speed  Opsys   Arch Par   Load Res Classes                        Network       
                 Features              

td204.pic.es          Running   1:4     8115:8115    10635:15387    8115:8115   
 1.00  linux [NONE] DEF   3.00 003 [long_4:4][medium_4:4][short_4 [DEFAULT]     
                 [ifae]              
-----                     ---   1:4     8115:8115    10635:15387    8115:8115  

Total Nodes: 1  (Active: 1  Idle: 0  Down: 0)




If I force them (runnjob) they start, but meanwhile, I have a looong
queueu wuth many jobs that could also start in other queues.

Where may I start looking for the source of this problem?


Cheers,
Arnau
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

[Mauiusers] jobs not starting when avaliable resources

Reply via email to