I can't see anything interesting/unusual under tracejobs, apart from the obvious fact that as you say it keeps trying to run and requeueing.

I can't find any reason why that could be, tracejob didn't seem to show anything _that_ out of the ordinary (apart from the jobs trying to run thousands of times) and the mom_logs didn't show anything interesting either!
the pbs_server logs show a lot of this though... not sure what it means:

09/24/2008 15:54:36;0040;PBS_Server;Svr;steel.mib.man.ac.uk;Scheduler sent command new 09/24/2008 15:54:37;0008;PBS_Server;Job;22.steel.mib.man.ac.uk;Job Modified at request of [EMAIL PROTECTED] 09/24/2008 15:54:37;0008;PBS_Server;Job;22.steel.mib.man.ac.uk;Job Run at request of [EMAIL PROTECTED] 09/24/2008 15:54:37;0008;PBS_Server;Job;22.steel.mib.man.ac.uk;Job Modified at request of [EMAIL PROTECTED]



Quoting "Craig West" <[EMAIL PROTECTED]>:


Philip,

I think you will find the job or a node has an error. It is being
continuously restarted. Notice the start_count variable is high, also
there is the exit_status variable which doesn't usually appear until
the job has exited (at least once).

I think the job is being continuously re-queued. You may want to put a
hold on it, or delete it until you can understand why.

I would check the logs on the server to see which nodes it trying to
run on, then check that node to see if there is a problem.
"tracejob <jobid>" should show some useful information, but needs to be
run by root to get detailed information. The "exec_host" variable will
tell you which nodes it is trying to run on.

Craig.


   etime = Wed Sep 24 14:08:27 2008
   exit_status = -3
   submit_args = qsubtest.com
   start_time = Wed Sep 24 14:08:28 2008
   start_count = 1756


I don't understand the priority being zero, as maui lists the startpriority as 60. Something appears to be not communicating somewhere. Could someone shed some light on it?

_______________________________________________
torqueusers mailing list
[EMAIL PROTECTED]
http://www.supercluster.org/mailman/listinfo/torqueusers



_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to