As an addition to the last email... it seems it's only multinode jobs
that are getting stuck, but allow multinode jobs are allowed in maui,
and I can't see a setting in torque for that!
Quoting "Craig West" <[EMAIL PROTECTED]>:
Philip,
I think you will find the job or a node has an error. It is being
continuously restarted. Notice the start_count variable is high, also
there is the exit_status variable which doesn't usually appear until
the job has exited (at least once).
I think the job is being continuously re-queued. You may want to put a
hold on it, or delete it until you can understand why.
I would check the logs on the server to see which nodes it trying to
run on, then check that node to see if there is a problem.
"tracejob <jobid>" should show some useful information, but needs to be
run by root to get detailed information. The "exec_host" variable will
tell you which nodes it is trying to run on.
Craig.
etime = Wed Sep 24 14:08:27 2008
exit_status = -3
submit_args = qsubtest.com
start_time = Wed Sep 24 14:08:28 2008
start_count = 1756
I don't understand the priority being zero, as maui lists the
startpriority as 60. Something appears to be not communicating
somewhere. Could someone shed some light on it?
_______________________________________________
torqueusers mailing list
[EMAIL PROTECTED]
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers