Hello,

I have jobs that should start running immediately on available resources but at 
the moment get stuck as queued job for relatively long periods of time, 
anywhere from 7 minutes to over 12 hours.  On the cluster in question only half 
of the nodes are being utilized by other jobs, so all new single core jobs 
should start immediately.  For example, here is the checkjob and tracejob 
output for a 10 second job I've submitted as a test


# CHECKJOB OUTPUT ###########
checking job 2066650

State: Idle
Creds:  user:chrisbee  group:chrisbee  class:short  qos:DEFAULT
WallTime: 00:00:00 of 00:00:10
SubmitTime: Mon Oct  4 12:27:11
 (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [16GB]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

PE:  1.00  StartPriority:  91
job can run in partition DEFAULT (128 procs available.  1 procs required)
##############################

# TRACEJOB OUPUT ###########
Job: 2066650

10/04/2010 12:27:11  S    enqueuing into route, state 1 hop 1
10/04/2010 12:27:11  S    dequeuing from route, state QUEUED
10/04/2010 12:27:11  S    enqueuing into short, state 1 hop 1
10/04/2010 12:27:11  S    Job Queued at request of chris...@bloom,
                         owner = chris...@bloom, job name = STDIN,
                         queue = short
10/04/2010 12:27:11  A    queue=route
10/04/2010 12:27:11  A    queue=short
10/04/2010 12:33:54  S    Job Modified at request of m...@bloom
10/04/2010 12:33:54  S    Job Run at request of m...@bloom
10/04/2010 12:33:54  S    Job Modified at request of m...@bloom
10/04/2010 12:33:54  S    Exit_status=0 resources_used.cput=00:00:00 
resources_used.mem=0kb
                         resources_used.vmem=0kb 
resources_used.walltime=00:00:00
10/04/2010 12:33:54  A    user=chrisbee group=chrisbee jobname=STDIN queue=short
                         ctime=1286220431 qtime=1286220431 etime=1286220431
                         start=1286220834 owner=chris...@bloom
                         exec_host=compute-0-31/0 
Resource_List.neednodes=compute-0-31
                         Resource_List.nodect=1 Resource_List.nodes=1
                         Resource_List.walltime=00:00:10 
10/04/2010 12:33:54  A    user=chrisbee group=chrisbee jobname=STDIN queue=short
                         ctime=1286220431 qtime=1286220431 etime=1286220431
                         start=1286220834 owner=chris...@bloom
                         exec_host=compute-0-31/0 Resource_List.neednodes=1
                         Resource_List.nodect=1 Resource_List.nodes=1
                         Resource_List.walltime=00:00:10 session=7699 
end=1286220834
                         Exit_status=0 resources_used.cput=00:00:00 
resources_used.mem=0kb
                         resources_used.vmem=0kb 
resources_used.walltime=00:00:00
##############################

It appears that it should be able to run right away, but it actually takes 
almost 7 minutes just to start running.

I'm using torque version 2.3.6 and maui version 3.2.6p21.

Any help in sorting out why these jobs don't start right away would be greatly 
appreciated.

Thanks,
Chris


-- 
Chris Berthiaume
Center for Environmental Genomics
University of Washington
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to