Hi,
What is an alternate way of reaping the exit status and resource
utilization of a finished job in a TORQUE cluster?
as far as I know there are none (maybe some C API ?!). Torque writes a
job utilization info directly either into a job accounting log or into
(partly) a server log.
However torque has a feature not to immediately remove a completed job
from memory, but keep it for some configurable amount of time so that
the qstat command can display its utilization, including an exit
status. This feature is not a default in torque server and should be
configured (option keep_completed)
http://docs.adaptivecomputing.com/torque/6-0-1/help.htm#topics/torque/2-jobs/keepingCompletedJobs.htm?Highlight=keep_completed.
We have that enabled here and I guess most torque sites do the same.
In that case he qstat output looks as follows. Note the char 'C' in
the job state (S) column
qstat -a 1097122
Req'd Req'd Elap
Job ID Username Queue Jobname SessID
NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------
----- ------ ------ --------- - ---------
1097122 abcd all UWOT_O_Grid_PDE 7071
4 24 48gb 10:00:00 C --
The command qstat -f displays the job exit status
qstat -f 1097122
.....
etime = Wed Jun 1 22:02:27 2016
exit_status = 0
submit_args = abc
....
Best ,
Gizo
Ciao,
R
--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/#Riccardo.Murri
S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4208
Fax: +41 44 635 6888
--
Dr. Gizo Nanava
Leibniz Universitaet IT Services
Leibniz Universitaet Hannover
Schlosswender Str. 5
D-30159 Hannover
Tel +49 511 762 7919085
http://www.luis.uni-hannover.de