Hello,
I have jobs that should start running immediately on available resources but at
the moment get stuck as queued job for relatively long periods of time,
anywhere from 7 minutes to over 12 hours. On the cluster in question only half
of the nodes are being utilized by other jobs, so all new single core jobs
should start immediately. For example, here is the checkjob and tracejob
output for a 10 second job I've submitted as a test
# CHECKJOB OUTPUT ###########
checking job 2066650
State: Idle
Creds: user:chrisbee group:chrisbee class:short qos:DEFAULT
WallTime: 00:00:00 of 00:00:10
SubmitTime: Mon Oct 4 12:27:11
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [16GB]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 1.00 StartPriority: 91
job can run in partition DEFAULT (128 procs available. 1 procs required)
##############################
# TRACEJOB OUPUT ###########
Job: 2066650
10/04/2010 12:27:11 S enqueuing into route, state 1 hop 1
10/04/2010 12:27:11 S dequeuing from route, state QUEUED
10/04/2010 12:27:11 S enqueuing into short, state 1 hop 1
10/04/2010 12:27:11 S Job Queued at request of chris...@bloom,
owner = chris...@bloom, job name = STDIN,
queue = short
10/04/2010 12:27:11 A queue=route
10/04/2010 12:27:11 A queue=short
10/04/2010 12:33:54 S Job Modified at request of m...@bloom
10/04/2010 12:33:54 S Job Run at request of m...@bloom
10/04/2010 12:33:54 S Job Modified at request of m...@bloom
10/04/2010 12:33:54 S Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:00
10/04/2010 12:33:54 A user=chrisbee group=chrisbee jobname=STDIN queue=short
ctime=1286220431 qtime=1286220431 etime=1286220431
start=1286220834 owner=chris...@bloom
exec_host=compute-0-31/0
Resource_List.neednodes=compute-0-31
Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=00:00:10
10/04/2010 12:33:54 A user=chrisbee group=chrisbee jobname=STDIN queue=short
ctime=1286220431 qtime=1286220431 etime=1286220431
start=1286220834 owner=chris...@bloom
exec_host=compute-0-31/0 Resource_List.neednodes=1
Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=00:00:10 session=7699
end=1286220834
Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:00
##############################
It appears that it should be able to run right away, but it actually takes
almost 7 minutes just to start running.
I'm using torque version 2.3.6 and maui version 3.2.6p21.
Any help in sorting out why these jobs don't start right away would be greatly
appreciated.
Thanks,
Chris
--
Chris Berthiaume
Center for Environmental Genomics
University of Washington
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers